Open NHoang98 opened 3 weeks ago
Hi thank you for reporting the error. Could you please send me the stderror.txt file that should have been generated in the outfolder of the problematic cluster (Documents/Gac/RNAseq/4.mapping/isONclust/Aril/isoform/A3/892) ? This should give me the necessary information about what issued the problem
Hi @aljpetri, thank you for the quick response. This is the report file from cluster 892 stderr.txt
Hello, from the file I can see that isONform was not able to locate the necessary file in the temporary folder. This might happen due to the system limiting the number of files that it allows to be generated. Do you know of any file number limits on your system? If a file limit is the actual problem it could help if you ran isONform withou the --split_wrt_batches argument. I hope this helps
Well, we just re-run without --split_wrt_batches
. The process is finished after just a short time but turns out there are only empty files. Also, sadly my correction folder has been removed (It's kinda weird tho). It's kinda stressful that we have to re-run all of the correction steps.
Here is what from the terminal:
Printing instances
<multiprocessing.context.SpawnContext object at 0x7f2313995fd0>
Environment set: <multiprocessing.context.SpawnContext object at 0x7f2313995fd0>
Using 8 cores.
Time elapsed multiprocessing: 0.2520012855529785
Merging...
Batch Merging
Generating transcriptome.fasta
Joined back batched files in: 15.027465343475342
Finished full algo after : 15.307952642440796
Hi thanks for letting me know. I found that the deleting of corrected files was a bug in the code that I have fixed now with a new version (isONform 0.3.7). I am really sorry about you having to rerun the correction step. I am wondering though why the resulting files all were empty. Is it possible that the support for the predictions was too low, i.e. no prediction having more than 5 reads as support?
I think it's kinda weird because if I use --split_wrt_batches
, it takes like 1 to 2 hours before it gets an error. But without it, the progress is finished after like 5 mins.
fyi: We have like 81k corrected clusters (from 9.7M filtered reads) with at least 3 reads supported for each (option -N
in write_fastq
in isONclust
). We use cDNA ver11 kit (SQK-PCB111.24) so I also set --k 9 --w 10 --max_seqs 1000
in isONcorrect. Could you suggest any option might improve the result?
Sorry just for clarification: Did the algorithm finish successfully with--split_wrt_batches being disabled?
Well, we just re-run without
--split_wrt_batches
. The process is finished after just a short time but turns out there are only empty files. Also, sadly my correction folder has been removed (It's kinda weird tho). It's kinda stressful that we have to re-run all of the correction steps.Here is what from the terminal:
Printing instances <multiprocessing.context.SpawnContext object at 0x7f2313995fd0> Environment set: <multiprocessing.context.SpawnContext object at 0x7f2313995fd0> Using 8 cores. Time elapsed multiprocessing: 0.2520012855529785 Merging... Batch Merging Generating transcriptome.fasta Joined back batched files in: 15.027465343475342 Finished full algo after : 15.307952642440796
Yes, this is the output from the run
Ok. Could you try running isONform in an earlier version (i.e. 0.3.5) for the dataset and see whether you have the same issues? The bug might be due to a code change with version 0.3.6. I am really sorry for the inconvenience.
We are totally understand the situation. So to make it clear, should we re-run with or without --split_wrt_batches option? Is the delete corrected cluster bug also in ver 0.3.5?
Hi version 0.3.5 does not have the deletion bug. Please run the code with --split_wrt_batches enabled
Hello, we did try the 0.3.5 version but it seems like something went wrong (sadly). Several batches have been processed but are being killed shortly. When I checked the stderr.txt file, it looks like "couldn't find tmp file" is the problem. Here are the output from terminal ( isONform_4Nov.zip; sorry I had to zipped since it's too large for github) and stderr file (stderr.txt).
Hi I am sorry for the long silence. I have found a bug in the isONform codebase when run without split_wrt_batches and am currently rewriting the code of isONform to properly read the files from isONcorrect and isONclust. I will post here as soon as I fixed everything and tested it. I am sorry for this taking some time. Best, Alex
Hi I have now resolved isONform to properly handle running without --split_wrt_batches
. This should reduce the number of files, as from your messages I believe you might have a problem with the number of files exceeding your system's upper threshold. Version 0.3.8 is uploaded. Please feel free to try it out.
Hello,
Nice to see this discussion and these developments - I was having a similar problem as the users above (the pip default version of isONform was generating empty files).
I've tried version 0.3.8 - it unfortunately also seems to have a bug:
(isoncorrect) ~/isONform/isONform_parallel --fastq_folder correction2/ --t 8 --outfolder assembly2 --verbose
8
isONcorrect structure
Moved and renamed correction2/12345/corrected_reads.fastq to correction2/12345.fastq
Moved and renamed correction2/12298/corrected_reads.fastq to correction2/12298.fastq
Moved and renamed correction2/7407/corrected_reads.fastq to correction2/7407.fastq
... [many such lines]
Traceback (most recent call last):
File "/home/claumer/isONform/isONform_parallel", line 380, in <module>
main(args)
~~~~^^^^^^
File "/home/claumer/isONform/isONform_parallel", line 218, in main
restructure_isoncorrect_output(directory)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
File "/home/claumer/isONform/isONform_parallel", line 52, in restructure_isoncorrect_output
Parallelization_side_functions.remove_folders(directory)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
File "/home/claumer/isONform/modules/Parallelization_side_functions.py", line 118, in remove_folders
shutil.rmtree(os.path.join(outfolder,subfolder))
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/claumer/miniconda3/envs/isoncorrect/lib/python3.13/shutil.py", line 763, in rmtree
_rmtree_safe_fd(stack, onexc)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
File "/home/claumer/miniconda3/envs/isoncorrect/lib/python3.13/shutil.py", line 704, in _rmtree_safe_fd
onexc(func, path, err)
~~~~~^^^^^^^^^^^^^^^^^
File "/home/claumer/miniconda3/envs/isoncorrect/lib/python3.13/shutil.py", line 665, in _rmtree_safe_fd
orig_st = os.lstat(name, dir_fd=dirfd)
FileNotFoundError: [Errno 2] No such file or directory: 'correction2/correction2/12345'
I notice that it will finish if I manually move all of the .fastq files to a new directory and call isONform again, pointing it at this new directory. But again, when it finishes, even though it writes a transcriptome.fasta file that looks correct, it halts with a FileNotFound error, for some reason always associated to cluster # 12345.
Hello, We are currently running our de novo transcriptome assembly project. We ran isONclust and isONcorrect separately instead of
isON_pipeline.sh
pipeline. Everything was perfectly fine until the last step which was in isONform. While running isONform we encountered an error which is:The package was installed by pip in a newly created anaconda evironment. The analysis was run via:
isONform_parallel --fastq_folder Documents/Gac/RNAseq/4.mapping/isONclust/Aril/polish/A3 --outfolder Documents/Gac/RNAseq/4.mapping/isONclust/Aril/isoform/A3 --split_wrt_batches
and got canceled after a few hundred batches.fyi: we run our analysis on an local workstation.
Could you please let me know how to fix this error? Thanks