Closed ute-hoffmann closed 8 months ago
Checked again and "Picosynechococcus sp. PCC 11901" does not seem to be a valid species name either, so "Synechococcus sp. PCC 11901" would have to be entered instead, which is not part of the fasta headers or given in the .gff file which were downloaded. A possible solution would be to extract the species field given in the gff file - even though I am not sure if all gffs contain this field (in case of Synechococcus 11901 the following line: ##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=2579791
)
good find! will try to reproduce this and see how to fix. Ideally there is an automatic solution such that the user does not need to specify the species name manually
When checking, it also did not become obvious to me if the downloaded meta data is anyways of importance for downstream analyses etc. If it is not, a possible solution might be to just simply give some mock species ID or species name and ignore the meta data.
@ute-hoffmann check out dev
branch, where this is fixed now.
When using "GCF_005577135.1", species name is parsed as "Picosynechococcus sp. PCC 11901 chromosome". The correct and valid name for the organism's name would be “Synechococcus sp. PCC 11901” or "Picosynechococcus sp. PCC 11901" (https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=2579791). This causes (in create_bsgenome.R)
txdb <- quiet_txdb( file = genome_gff, organism = genome_name, chrominfo = seqinfo_genome )$result
to break, which can be fixed by hardcoding the taxonomyId:
txdb <- quiet_txdb( file = genome_gff, organism = genome_name, chrominfo = seqinfo_genome, taxonomyId = 2579791 )$result
The same happens in design_guides.R in:
txdb <- makeTxDbFromGFF( file = genome_gff, organism = unname(genome(seqinfo_genome)[1]), chrominfo = seqinfo_genome )