Closed Kefaun2601 closed 3 years ago
This is looking better @kyan2601 thanks.
Confirmed that I can run a crawl job and view the data on the Banana dashboard. Did not notice any regressions in functionality that may have arose from the implemented Solr abstractions.
Commands run (as per the repo README):
# Start in the sparkler-core directory cd sparkler-core # Run script to start docker container and forward ports to host bash ./bin/dockler.sh # Inject seed urls /data/sparkler/bin/sparkler.sh inject -id 1 -su 'http://www.bbc.com/news' # Start the crawl job /data/sparkler/bin/sparkler.sh crawl -id 1 -tn 100 -i 2
Access the Banana dashboard at http://localhost:8983/banana/ to see the data.
HJi @Kefaun2601 this PR and branch has a conflict which must be resolved. Once that's done, please tag me and I will test it out. Thank you
@lewismc Resolved the merge conflicts. Could you please review it? Thanks!
Added documentation for specifying the storage engine to use in the config file (sparkler-default.yaml): https://github.com/USCDataScience/sparkler/wiki/Specifying-CrawlDB-in-Config
Documentation on the StorageProxyFactory abstraction will be coming.
Added documentation for the StorageProxyFactory abstraction. Will build on this documentation as we expand the factory.
https://github.com/USCDataScience/sparkler/wiki/StorageProxyFactory-Abstraction
Also confirmed that this DOES NOT introduce a regression. Thank you @felixloesing and @kyan2601
What changes were proposed in this pull request?
Changes to sparkler-default.yaml configuration:
Added StorageProxy and StorageProxyFactory:
Is this related to an already existing issue on sparkler?
Related to #211 Related to #218
Will it close an existing issue?
Does not close an issue yet.
How was this patch tested?
This patch was tested by running "mvn clean package" within the "sparkler-core" directory. The tests currently pass, and it builds successfully.