Closed lifei176 closed 4 months ago
Go to the config.yaml and change the biomart_host to "http://nov2020.archive.ensembl.org"
@lifei176
Indeed, the answer of @Raghav1881 is correct.
Thank you :)
Best,
S
Quick question on this - when I replaced the biomart url with the archive for mm10 (nov2020), the download_genome_annotations job retrieved GRCm38.p6 as expected, but the NCBI assembly information gathered after appears to be associated with mm39 - will this lead to issues?
####### Example output 2024-08-01 11:59:07,085 Download gene annotation INFO Using genome: GRCm38.p6
2024-08-01 11:59:07,099 Download gene annotation INFO Found corresponding genome Id 52 on NCBI
2024-08-01 11:59:07,616 Download gene annotation INFO Found corresponding assembly Id 7358741 on NCBI
2024-08-01 11:59:08,133 Download gene annotation INFO Downloading assembly information from: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/635/GCF_000001635.27_GRCm39/GCF_000001635.27_GRCm39_assembly_report.txt
^ A quick selection of a few loci from the genome_annotations.tsv match the mm10 annotation, so I feel it may be fine, but I just want to make sure nothing downstream is complicated by the mismatch
In the "download_genome_annotations" step, "GRCm39" is the target genome for downstream analyses when tha species is mouse. However, compared with "GRCm39", "GRCm38" (mm10) is much more widely used. Can we optimize the code so that "GRCm38" will be the target genome? This will help the community a lot, especially for those exploring mouse data. Thank you in advance.