Netflix-Skunkworks / Scumblr

Web framework that allows performing periodic syncs of data sources and performing analysis on the identified results
Apache License 2.0
2.64k stars 317 forks source link

Request for multi-parameter searches from one defined task #141

Closed espressobeanies closed 7 years ago

espressobeanies commented 7 years ago

Good afternoon,

I'd like to see Scumblr be able to search one or multiple keywords across one or multiple search providers from a single, defined task. I can currently do this within Scumblr, but in order to do so, I'd have to create multiple tasks restricted by a single query and single search provider which starts becoming tedious. If it's something not doable, at least one-keyword across multiple search providers.

Thanks for the awesome tool!

scoope3 commented 7 years ago

I couldn't agree more. This would be an awesome feature to have

sbehrens commented 7 years ago

Hi @mars01 and @scoope3,

You should be able to do this using the System Metadata model, which I added in the last minor release update.

If you'd like to do this with either the Github or Curl tasks, take a look at this wiki page:

https://github.com/Netflix/Scumblr/wiki/System-Metadata

If you are talking about legacy search provider, you will have to extend the task slightly.

Here is an example code block you could add to a task that leverages the new SystemMetadata model:

Put this in lib/search_proiders/your_provider.rb (in the def self.options codeblock):

...
      :saved_terms => {name: "System Metadata Saved Search Strings",
                          description: "Use system metadata payloads to provide shared search terms.  Expects metadata to be in JSON array format.  ",
                          required: false,
                          type: :system_metadata},
...

When you actually want to load the saved search terms you could do the following in the initialize method:

    if(@options[:saved_terms].present?)
      begin
        saved_terms = SystemMetadata.where(id: @options[:saved_terms]).try(:first).metadata
      rescue
        saved_terms = nil
        create_event("Could not parse System Metadata for saved terms, skipping. \n Exception: #{e.message}\n#{e.backtrace}", "Error")
      end

      unless saved_terms.kind_of?(Array)
        saved_terms = nil
        create_event("System Metadata payloads should be in array format, exp: [\"foo\", \"bar\"]", "Error")
      end

      # If there are staved terms, load them.
      if saved_terms.present?
        @search_terms.concat(saved_terms)
        @search_terms = @saved_terms.reject(&:blank?)
      end
    end
  end
espressobeanies commented 7 years ago

Got it. Question though, once I create the metadata keys, I'm not seeing how to reference the keys in search tasks. Can these keys be referenced in Google, Facebook, Twitter searches or is it just limited to Curl and Github?

Thanks,

sbehrens commented 7 years ago

For the legacy search providers you'll need to follow the steps above (adding it to the google.rb task). Those steps will add an option to the self.options codeblock (which will then show you the system metadata in the UI), and then you'll have to add the other code block to actually retrieve those search terms, as shown in the second block.

sbehrens commented 7 years ago

We currently have only added this to Github and Curl tasks, although it's pretty easy to extend it to the other tasks as well.

espressobeanies commented 7 years ago

Gotcha. Okay, thanks sbehrens.