aljpetri / isONform

De novo construction of isoforms from long-read data
GNU General Public License v3.0
17 stars 2 forks source link

Error during running isONform #21

Open NHoang98 opened 3 weeks ago

NHoang98 commented 3 weeks ago

Hello, We are currently running our de novo transcriptome assembly project. We ran isONclust and isONcorrect separately instead of isON_pipeline.sh pipeline. Everything was perfectly fine until the last step which was in isONform. While running isONform we encountered an error which is:

Running isONform batch_id:893.0... multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/cmmr/anaconda3/envs/isonform/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/cmmr/anaconda3/envs/isonform/bin/isONform_parallel", line 51, in isONform
    subprocess.check_call(
  File "/home/cmmr/anaconda3/envs/isonform/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['python', '/home/cmmr/anaconda3/envs/isonform/bin/main', '--fastq', '/tmp/tmpfqngraa5/split_in_batches/892_0.fastq', '--outfolder', 'Documents/Gac/RNAseq/4.mapping/isONclust/Aril/isoform/A3/892', '--exact_instance_limit', '50', '--k', '20', '--w', '31', '--xmin', '18', '--xmax', '80', '--delta_len', '5', '--exact', '--parallel', 'True', '--delta_iso_len_3', '30', '--delta_iso_len_5', '50']' returned non-zero exit status 1.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/cmmr/anaconda3/envs/isonform/bin/isONform_parallel", line 353, in <module>
    main(args)
  File "/home/cmmr/anaconda3/envs/isonform/bin/isONform_parallel", line 279, in main
    for x in pool.imap_unordered(isONform, instances):
  File "/home/cmmr/anaconda3/envs/isonform/lib/python3.10/multiprocessing/pool.py", line 873, in next
    raise value
subprocess.CalledProcessError: Command '['python', '/home/cmmr/anaconda3/envs/isonform/bin/main', '--fastq', '/tmp/tmpfqngraa5/split_in_batches/892_0.fastq', '--outfolder', 'Documents/Gac/RNAseq/4.mapping/isONclust/Aril/isoform/A3/892', '--exact_instance_limit', '50', '--k', '20', '--w', '31', '--xmin', '18', '--xmax', '80', '--delta_len', '5', '--exact', '--parallel', 'True', '--delta_iso_len_3', '30', '--delta_iso_len_5', '50']' returned non-zero exit status 1.

The package was installed by pip in a newly created anaconda evironment. The analysis was run via: isONform_parallel --fastq_folder Documents/Gac/RNAseq/4.mapping/isONclust/Aril/polish/A3 --outfolder Documents/Gac/RNAseq/4.mapping/isONclust/Aril/isoform/A3 --split_wrt_batches and got canceled after a few hundred batches.

fyi: we run our analysis on an local workstation.

Could you please let me know how to fix this error? Thanks

aljpetri commented 3 weeks ago

Hi thank you for reporting the error. Could you please send me the stderror.txt file that should have been generated in the outfolder of the problematic cluster (Documents/Gac/RNAseq/4.mapping/isONclust/Aril/isoform/A3/892) ? This should give me the necessary information about what issued the problem

NHoang98 commented 3 weeks ago

Hi @aljpetri, thank you for the quick response. This is the report file from cluster 892 stderr.txt

aljpetri commented 3 weeks ago

Hello, from the file I can see that isONform was not able to locate the necessary file in the temporary folder. This might happen due to the system limiting the number of files that it allows to be generated. Do you know of any file number limits on your system? If a file limit is the actual problem it could help if you ran isONform withou the --split_wrt_batches argument. I hope this helps

NHoang98 commented 3 weeks ago

Well, we just re-run without --split_wrt_batches. The process is finished after just a short time but turns out there are only empty files. Also, sadly my correction folder has been removed (It's kinda weird tho). It's kinda stressful that we have to re-run all of the correction steps.

Here is what from the terminal:

Printing instances
<multiprocessing.context.SpawnContext object at 0x7f2313995fd0>
Environment set: <multiprocessing.context.SpawnContext object at 0x7f2313995fd0>
Using 8 cores.
Time elapsed multiprocessing: 0.2520012855529785
Merging...
Batch Merging
Generating transcriptome.fasta
Joined back batched files in: 15.027465343475342
Finished full algo after : 15.307952642440796
aljpetri commented 3 weeks ago

Hi thanks for letting me know. I found that the deleting of corrected files was a bug in the code that I have fixed now with a new version (isONform 0.3.7). I am really sorry about you having to rerun the correction step. I am wondering though why the resulting files all were empty. Is it possible that the support for the predictions was too low, i.e. no prediction having more than 5 reads as support?

NHoang98 commented 3 weeks ago

I think it's kinda weird because if I use --split_wrt_batches , it takes like 1 to 2 hours before it gets an error. But without it, the progress is finished after like 5 mins. fyi: We have like 81k corrected clusters (from 9.7M filtered reads) with at least 3 reads supported for each (option -N in write_fastqin isONclust). We use cDNA ver11 kit (SQK-PCB111.24) so I also set --k 9 --w 10 --max_seqs 1000 in isONcorrect. Could you suggest any option might improve the result?

aljpetri commented 3 weeks ago

Sorry just for clarification: Did the algorithm finish successfully with--split_wrt_batches being disabled?

NHoang98 commented 3 weeks ago

Well, we just re-run without --split_wrt_batches. The process is finished after just a short time but turns out there are only empty files. Also, sadly my correction folder has been removed (It's kinda weird tho). It's kinda stressful that we have to re-run all of the correction steps.

Here is what from the terminal:

Printing instances
<multiprocessing.context.SpawnContext object at 0x7f2313995fd0>
Environment set: <multiprocessing.context.SpawnContext object at 0x7f2313995fd0>
Using 8 cores.
Time elapsed multiprocessing: 0.2520012855529785
Merging...
Batch Merging
Generating transcriptome.fasta
Joined back batched files in: 15.027465343475342
Finished full algo after : 15.307952642440796

Yes, this is the output from the run

aljpetri commented 3 weeks ago

Ok. Could you try running isONform in an earlier version (i.e. 0.3.5) for the dataset and see whether you have the same issues? The bug might be due to a code change with version 0.3.6. I am really sorry for the inconvenience.

NHoang98 commented 3 weeks ago

We are totally understand the situation. So to make it clear, should we re-run with or without --split_wrt_batches option? Is the delete corrected cluster bug also in ver 0.3.5?

aljpetri commented 3 weeks ago

Hi version 0.3.5 does not have the deletion bug. Please run the code with --split_wrt_batches enabled

NHoang98 commented 2 weeks ago

Hello, we did try the 0.3.5 version but it seems like something went wrong (sadly). Several batches have been processed but are being killed shortly. When I checked the stderr.txt file, it looks like "couldn't find tmp file" is the problem. Here are the output from terminal ( isONform_4Nov.zip; sorry I had to zipped since it's too large for github) and stderr file (stderr.txt).

aljpetri commented 1 week ago

Hi I am sorry for the long silence. I have found a bug in the isONform codebase when run without split_wrt_batches and am currently rewriting the code of isONform to properly read the files from isONcorrect and isONclust. I will post here as soon as I fixed everything and tested it. I am sorry for this taking some time. Best, Alex

aljpetri commented 3 days ago

Hi I have now resolved isONform to properly handle running without --split_wrt_batches. This should reduce the number of files, as from your messages I believe you might have a problem with the number of files exceeding your system's upper threshold. Version 0.3.8 is uploaded. Please feel free to try it out.

claumer commented 1 day ago

Hello,

Nice to see this discussion and these developments - I was having a similar problem as the users above (the pip default version of isONform was generating empty files).

I've tried version 0.3.8 - it unfortunately also seems to have a bug:

(isoncorrect) ~/isONform/isONform_parallel --fastq_folder correction2/ --t 8 --outfolder assembly2 --verbose
8
isONcorrect structure
Moved and renamed correction2/12345/corrected_reads.fastq to correction2/12345.fastq
Moved and renamed correction2/12298/corrected_reads.fastq to correction2/12298.fastq
Moved and renamed correction2/7407/corrected_reads.fastq to correction2/7407.fastq
... [many such lines]
Traceback (most recent call last):
  File "/home/claumer/isONform/isONform_parallel", line 380, in <module>
    main(args)
    ~~~~^^^^^^
  File "/home/claumer/isONform/isONform_parallel", line 218, in main
    restructure_isoncorrect_output(directory)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
  File "/home/claumer/isONform/isONform_parallel", line 52, in restructure_isoncorrect_output
    Parallelization_side_functions.remove_folders(directory)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
  File "/home/claumer/isONform/modules/Parallelization_side_functions.py", line 118, in remove_folders
    shutil.rmtree(os.path.join(outfolder,subfolder))
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/claumer/miniconda3/envs/isoncorrect/lib/python3.13/shutil.py", line 763, in rmtree
    _rmtree_safe_fd(stack, onexc)
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
  File "/home/claumer/miniconda3/envs/isoncorrect/lib/python3.13/shutil.py", line 704, in _rmtree_safe_fd
    onexc(func, path, err)
    ~~~~~^^^^^^^^^^^^^^^^^
  File "/home/claumer/miniconda3/envs/isoncorrect/lib/python3.13/shutil.py", line 665, in _rmtree_safe_fd
    orig_st = os.lstat(name, dir_fd=dirfd)
FileNotFoundError: [Errno 2] No such file or directory: 'correction2/correction2/12345'

I notice that it will finish if I manually move all of the .fastq files to a new directory and call isONform again, pointing it at this new directory. But again, when it finishes, even though it writes a transcriptome.fasta file that looks correct, it halts with a FileNotFound error, for some reason always associated to cluster # 12345.