denglab / SeqSero2

SeqSero2
Other
33 stars 18 forks source link

SeqSero2 "freezes" on particular sample #48

Open eam12 opened 1 year ago

eam12 commented 1 year ago

Thank you for creating SeqSero2. I use it almost everyday! I'm currently running SeqSero2 (v.1.1.1) on thousands of samples, but for a particular sample SeqSero2 freezes every time at the assembling... step. I've waited up to 12 hours for the process to finish, but nothing happens. All other samples have run successfully within minutes so I'm not sure what it is about this particular one. The FASTQ files look completely normal, as does the assembly. In the data_log.txt file the last few lines printed are as follows:

  0:00:03.150    17M / 32M   INFO    General                 (kmer_coverage_model.cpp   : 259)   Fitting coverage model
  0:00:03.154    17M / 32M   INFO    General                 (kmer_coverage_model.cpp   : 295)   ... iteration 2
  0:00:03.161    17M / 32M   INFO    General                 (kmer_coverage_model.cpp   : 295)   ... iteration 4
  0:00:03.171    17M / 32M   INFO    General                 (kmer_coverage_model.cpp   : 295)   ... iteration 8
  0:00:03.191    17M / 32M   INFO    General                 (kmer_coverage_model.cpp   : 295)   ... iteration 16
  0:00:03.223    17M / 32M   INFO    General                 (kmer_coverage_model.cpp   : 295)   ... iteration 32
  0:00:03.285    17M / 32M   INFO    General                 (kmer_coverage_model.cpp   : 295)   ... iteration 64

I can send the paired FASTQ files upon request.

tongzhouxu commented 1 year ago

Hi,

Could you please provide the command you used and share the FASTQ files so we can try reproducing the error?

Thanks!

eam12 commented 1 year ago

The two FASTQ files are 70MB each so GitHub won't let me upload them. Is there an email I could send them to?

Command I used:

SeqSero2_package.py -s -t 2 -p 12 -i SRR17736741_trim_R1_paired.fastq.gz SRR17736741_trim_R2_paired.fastq.gz -d SRR17736741_seqsero2 -n SRR17736741

A couple more things:

tongzhouxu commented 1 year ago

It might be a bug with Spades. Could you please try updating spades to V3.9.0 and see if you still have the same issue. If the problem persists, please send the raw reads to tongzhou.xu@uga.edu and we will try to reproduce the error.

Thanks!

eam12 commented 1 year ago

It worked! Well, it didn't give me a serovar prediction (No serotype antigens were detected. This is an atypical result that should be further investigated.), but at least it didn't freeze! Many, many thanks for the suggestion.

denglab commented 1 year ago

Hi,

Glad to know you were able to fix the bug. From power users like you, we will be very happy to hear feedback on “atypical” serotypes, which may yield new alleles that we could add to SeqSero2 allele databases.

Best, Xiangyu

From: eam12 @.> Sent: Tuesday, August 1, 2023 11:05 AM To: denglab/SeqSero2 @.> Cc: Subscribed @.***> Subject: Re: [denglab/SeqSero2] SeqSero2 "freezes" on particular sample (Issue #48)

[EXTERNAL SENDER - PROCEED CAUTIOUSLY]

It worked! Well, it didn't give me a serovar prediction (No serotype antigens were detected. This is an atypical result that should be further investigated.), but at least it didn't freeze! Many, many thanks for the suggestion.

— Reply to this email directly, view it on GitHubhttps://github.com/denglab/SeqSero2/issues/48#issuecomment-1660511555, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACTDDUAXPSLST75LVX4KY43XTELLJANCNFSM6AAAAAA26SH64A. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>

eam12 commented 1 year ago

To follow-up on this a few weeks later, I'm now receiving the following message for every sample I try to run:

Note:   No serotype antigens were detected. This is an atypical result that should be further investigated. 

When I look at the log file I see the error message:

== Error ==  python version 3.7 is not supported!
Supported versions are 2.4, 2.5, 2.6, 2.7, 3.2, 3.3, 3.4, 3.5

So it looks like version 3.9.0 of spades (the version you suggested I downgrade to) isn't compatible with the version of python I'm using (v.3.8). To fix this, I tried downgrading the version of python to 3.5, but that led to a number of additional package conflicts that couldn't be solved:

Problem: package biopython-1.73-py37h14c3975_0 requires python >=3.7,<3.8.0a0, but none of the providers can be installed

I did try running standalone spades (v.3.15.5) on the troublesome sample and it ran to completion so could the issue be with how SeqSero2 interacts with spades?

At this point, I'm not really sure how to fix this issue. Do you have any further suggestions? Thanks so much!

tongzhouxu commented 1 year ago

Hi, Downgrading python could lead to conflicts. I would suggest creating a new conda environment and reinstall seqsero2 using: conda create -n seqsero2 python=3.6 conda install -c bioconda seqsero2=1.2.1 This should install seqsero2 along with the latest compatible version of spades. Please let me know if the problem persists.

Thanks, Tongzhou

eam12 commented 1 year ago

Hi Tongzhou, Thanks so much for your reply. Unfortunately, it still seems to be "freezing" during assembly. The version of spades being used is v.3.14.1. Perhaps an older version of spades is required? I'm now trying to play around with what version of spades to use that's greater than v.3.9, but less than at least v.3.14.1.

tongzhouxu commented 1 year ago

Hi, I downloaded SRR17736741_1.fastq and SRR17736741_2.fastq from NCBI and ran SeqSero2 with no issue. I am testing with spades.py v3.14.1 and v3.15.2 on ubuntu. I noticed a similar bug reported here https://github.com/ablab/spades/issues/372. So it might be a bug that is specific to certain spades versions. I would suggest try updating spades.py using conda instead of downgrading.

Thanks, Tongzhou

eam12 commented 1 year ago

Hi Tongzhou,

I did process the SRR17736741 FASTQ files through Trimmomatic so perhaps that is why you have been able to successfully run SRR17736741 through SeqSero2 and I have not.

I've already tried SeqSero2 with the most recent spades release (v.3.15.5), in addition to v.3.14.1, and it still hangs for SRR17736741:

% SeqSero2_package.py -s -t 2 -p 12 -i ~/SRR17736741_trim_R1_paired.fastq.gz ~/SRR17736741_trim_R2_paired.fastq.gz -d ~/SRR17736741_seqsero2 -n SRR17736741
building database...
mapping...
check samtools version: 1.17
[bam_sort_core] merging from 0 files and 12 in-memory blocks...
assembling...

When I run spades.py by itself (both v.3.14.1 and v.3.15.5, independently of SeqSero2), SRR17736741 runs with no problems:

% spades.py -1 ~/SRR17736741_trim_R1_paired.fastq.gz -2 ~/SRR17736741_trim_R2_paired.fastq.gz -o ~/SRR17736741_spades -t 12
======= SPAdes pipeline finished.

SPAdes log can be found here: ~/SRR17736741_spades/spades.log

Thank you for using SPAdes!

I guess I'm confused as to why spades would run perfectly on its own, but not within the context of SeqSero2.