callahantiff / PheKnowLator

PheKnowLator: Heterogeneous Biomedical Knowledge Graphs and Benchmarks Constructed Under Alternative Semantic Models
https://github.com/callahantiff/PheKnowLator/wiki
Apache License 2.0
159 stars 29 forks source link

CTD Data Source - CAPTCHA #35

Closed callahantiff closed 4 years ago

callahantiff commented 4 years ago

Issue: CTD now has a CAPTCHA is place to prevent automatic downloading of data. This impacts the current build as there is no solution currently in place to work around this.

Temporary Workaround: All CTD data sources need to be manually downloaded to the resources/edge_data repo prior to running the download step of the build. The downloaded file also needs to be unzipped and have the edge type label appended to the front of the file name (example below).


File: edge_source_list.txt

chemical-disease, http://ctdbase.org/reports/CTD_chemicals_diseases.tsv.gz
chemical-gene, http://ctdbase.org/reports/CTD_chem_gene_ixns.tsv.gz
chemical-phenotype, http://ctdbase.org/reports/CTD_chemicals_diseases.tsv.gz
chemical-protein, http://ctdbase.org/reports/CTD_chem_gene_ixns.tsv.gz

Repository: resources/edge_data/ chemical-disease_CTD_chemicals_diseases.tsv chemical-gene_CTD_chem_gene_ixns.tsv chemical-phenotype_CTD_chemicals_diseases.tsv chemical-protein_CTD_chem_gene_ixns.tsv

callahantiff commented 4 years ago

RESOLUTION: Worked with CTD and for now, they have agreed to lift the constraint on CAPTCHAs, assuming they don't experience anymore attacks.