HW-SWeL / BMUSE

Bioschemas Mark Up Scraper and Extractor
https://app.swaggerhub.com/apis-docs/swel/BMUSE/
Apache License 2.0
3 stars 5 forks source link

How to pass static configuration through url list file #86

Open AlasdairGray opened 3 years ago

AlasdairGray commented 3 years ago

@petrospaps I'm trying to override the dynamic=true parameter in the localconfig.properties file by using the value on the end of the url list file.

Following the instructions in the README has not resulted in the static scraper being used. Below are the snippets of the url list file that I tried.

https://bgee.org/sitemap_main.xml,static
https://bgee.org/?page=gene&gene_id=ENSG00000274928,static
https://bgee.org/?page=gene&gene_id=ENSG00000274928, static

None of these overrode the dynamic setting in the local config file.

Even removing the setting from the local config file did not result in the static scraper being used.

I'm basing this on the following exert of the log file

11:26:16.619 [INFO] hwu.elixir.scrape.scraper.examples.FileScraper - Attempting to scrape: https://bgee.org/?page=gene&gene_id=ENSG00000274928
11:26:16.619 [INFO] hwu.elixir.scrape.scraper.ScraperFilteredCore - dynamic scraping setting
11:26:27.713 [ERROR] hwu.elixir.scrape.scraper.ScraperCore - URL timed out: https://bgee.org/?page=gene&gene_id=ENSG00000274928. Trying JSoup.
11:26:28.295 [DEBUG] hwu.elixir.scrape.scraper.ScraperFilteredCore - Number of JSONLD sections: 0
AlasdairGray commented 3 years ago

Equivalent log messages when static is set in the configuration file

tatic scraping setting
11:31:29.682 [DEBUG] hwu.elixir.scrape.scraper.ScraperFilteredCore - Number of JSONLD sections: 0
11:31:29.728 [INFO] hwu.elixir.scrape.scraper.ScraperCore - https://bgee.org/?page=gene&gene_id=ENSA
AlasdairGray commented 3 years ago

Related to #85

AlasdairGray commented 2 years ago

Need to document how to annotate the configuration file