Open Thomieh73 opened 3 years ago
Okay, Today I had run into the same problem with get_abundances. The difference with this run was that I had changed the min-length parameter to 1300.
I had one dataset crashing again. and now I explored which taxon was causing the error. I therefore extracted the following taxon list from barcode14.trimmed.fastq_nanoclust_out.txt
sciname,taxid
Glaciecola amylolytica,2489595
Thiolapillus brandeum,1076588
Candidatus Pelagibacter ubique HTCC1062,335992
Polaribacter butkevichii,218490
Gemmatimonas aurantiaca T-27,379066
Thalassomonas haliotis,485448
Arenibacter nanhaiticus,558155
Phaeocystidibacter marisrubri,1577780
Methylophilus methylotrophus,17
Poseidonibacter lekithochrous,1904463
Glaciecola amylolytica,2489595
Alkalimarinus sediminis,1632866
Actinomarinicola tropica,2789776
Owenweeksia hongkongensis DSM 17368,926562
Roseobacter litoralis,42443
Candidatus Pelagibacter ubique HTCC1062,335992
I then ran this command with the Tax IDs:
curl -X POST -H 'Accept: application/json' api.unipept.ugent.be/api/v1/taxonomy \
-d 'input[]=2489595' \
-d 'input[]=1076588' \
-d 'input[]=335992' \
-d 'input[]=218490' \
-d 'input[]=485448' \
-d 'input[]=558155' \
-d 'input[]=1577780' \
-d 'input[]=17' \
-d 'input[]=1904463' \
-d 'input[]=2489595' \
-d 'input[]=1632866' \
-d 'input[]=2789776' \
-d 'input[]=926562' \
-d 'input[]=42443' \
-d 'input[]=335992'
and this is the output:
{"taxon_id":2489595,"taxon_name":"Glaciecola sp. THG-3.7","taxon_rank":"species"},
{"taxon_id":1076588,"taxon_name":"Thiolapillus brandeum","taxon_rank":"species"},
{"taxon_id":335992,"taxon_name":"Candidatus Pelagibacter ubique HTCC1062","taxon_rank":"no rank"},
{"taxon_id":218490,"taxon_name":"Polaribacter butkevichii","taxon_rank":"species"},
{"taxon_id":485448,"taxon_name":"Thalassomonas haliotis","taxon_rank":"species"},
{"taxon_id":558155,"taxon_name":"Arenibacter nanhaiticus","taxon_rank":"species"},
{"taxon_id":1577780,"taxon_name":"Phaeocystidibacter marisrubri","taxon_rank":"species"},
{"taxon_id":17,"taxon_name":"Methylophilus methylotrophus","taxon_rank":"species"},
{"taxon_id":1904463,"taxon_name":"Arcobacter lekithochrous","taxon_rank":"species"},
{"taxon_id":2489595,"taxon_name":"Glaciecola sp. THG-3.7","taxon_rank":"species"},
{"taxon_id":1632866,"taxon_name":"Alkalimarinus sediminis","taxon_rank":"species"},
{"taxon_id":926562,"taxon_name":"Owenweeksia hongkongensis DSM 17368","taxon_rank":"no rank"},
{"taxon_id":42443,"taxon_name":"Roseobacter litoralis","taxon_rank":"species"},
{"taxon_id":335992,"taxon_name":"Candidatus Pelagibacter ubique HTCC1062","taxon_rank":"no rank"}]
Checking the results I notice that for one taxon, I do not get anything back. Actinomarinicola tropica,2789776
Running the line:
curl -X POST -H 'Accept: application/json' api.unipept.ugent.be/api/v1/taxonomy \
-d 'input[]=2789776'
gives me []
I then removed the sample, and restarted the pipeline. It crashed with another sample. CHecking the species overview of that sample I find:
sciname,taxid
Owenweeksia hongkongensis DSM 17368,926562
Leisingera methylohalidivorans DSM 14336,999552
Actinomarinicola tropica,2789776
Poseidonibacter lekithochrous,1904463
Owenweeksia hongkongensis DSM 17368,926562
Candidatus Pelagibacter ubique HTCC1062,335992
Glaciecola amylolytica,2489595
Polaribacter franzmannii ATCC 700399,1248440
Methylotenera mobilis JLW8,583345
Roseobacter litoralis,42443
Glaciecola amylolytica,2489595
Candidatus Pelagibacter ubique HTCC1062,335992
Alkalimarinus sediminis,1632866
Thalassomonas haliotis,485448
Nisaea nitritireducens,568392
Amylibacter marinus,1475483
I then test if removing Actinomarinicola tropica,2789776
from the file: barcode38.trimmed.fastq.nanoclust_out.txt
is an option.
I ran the command: bash .command.run
That delivers me the following files:
barcode38.trimmed.fastq.nanoclust_out.txt
barcode38.trimmed.fastq_nanoclust_out.txt
rel_abundance_barcode38.trimmed.fastq_F.csv
rel_abundance_barcode38.trimmed.fastq_O.csv
rel_abundance_barcode38.trimmed.fastq_G.csv
rel_abundance_barcode38.trimmed.fastq_S.csv
Since taxa giving a blank respons crash the pipeline, I have send an email to unipept, if they can check out this issue. I am unsure how to solve it otherwise.
This is the same as noted in issue #19
The reply I got from unipept on this error:
Thank you for contacting us. The taxon you provided `2789776` is not present in our database at this time, that’s why our API does not respond with a valid response to your query. The Unipept database is currently based on UniProt version 2020.01, but we are currently in the process of updating our database to the latest UniProt version.
I’ve also discovered a problem with the `try it` module on our website, but I was able to resolve it (you should be able to use this module again).
Please don’t hesitate to contact us if the problem persists or if you have any other questions.
Kind regards,
So this might be resolved in the near future...
it is unclear how long the updating of unipept will take:
I can’t give a hard deadline for this yet, since we are currently experiencing some issues with our server storage space that we need to sort out first. Our API returns an empty array in case none of the taxa ID’s you provided were found, so make sure that your pipeline can handle this case properly (we do not return an HTTP error code or anything in this case, just an empty array).
Hi, I am running the nanoclust pipeline and I get an error in the get_abundances step that I am not understanding. I noticed it for multiple samples in my dataset. When I removed 12 samples from my analysis, my pipeline finished.
This is the input file for one of the samples: barcode26.trimmed.fastq.nanoclust_out.txt
And this is the .command.log output:
It seems as the genus step of the script is causing the error, but I am not understanding why?
This is the .command.sh file for the process: