foerstner-lab / READemption

A pipeline for the computational evaluation of RNA-Seq data
https://reademption.readthedocs.io
Other
36 stars 19 forks source link

READemption aligner.sam found error #15

Closed TarsLi closed 1 year ago

TarsLi commented 5 years ago

Hi, I try to find the TSS with READemption in my RNA-seq, and there is an error. I have install the segemehl 0.2.0 on the PC and add it to the environment path. So could you please help me? Thanks.

my command and the errors

reademption align -p 50 READemption_analysis concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.6/concurrent/futures/process.py", line 175, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) File "/usr/local/lib/python3.6/dist-packages/reademptionlib/sambamconverter.py", line 11, in sam_to_bam if self._sam_file_is_empty(sam_path) is True: File "/usr/local/lib/python3.6/dist-packages/reademptionlib/sambamconverter.py", line 43, in _sam_file_is_empty for line in open(sam_path): FileNotFoundError: [Errno 2] No such file or directory: 'READemption_analysis/output/align/alignments/37_tex_non_rRNA_alignments_primary_aligner.sam' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/bin/reademption", line 327, in main() File "/usr/local/bin/reademption", line 291, in main args.func(controller) File "/usr/local/bin/reademption", line 301, in align_reads controller.align_reads() File "/usr/local/lib/python3.6/dist-packages/reademptionlib/controller.py", line 95, in align_reads self._paths.primary_read_aligner_bam_paths) File "/usr/local/lib/python3.6/dist-packages/reademptionlib/controller.py", line 381, in _sam_to_bam self._check_job_completeness(jobs) File "/usr/local/lib/python3.6/dist-packages/reademptionlib/controller.py", line 588, in _check_job_completeness raise(job.exception()) FileNotFoundError: [Errno 2] No such file or directory: 'READemption_analysis/output/align/alignments/37_tex_non_rRNA_alignments_primary_aligner.sam'

Tillsa commented 5 years ago

Hi, could you please tell me which reademption version you are using (the command "reademption --version" can be used). And could you please check whether the file READemption_analysis/output/align/alignments/37_tex_non_rRNA_alignments_primary_aligner.sam exists?

TarsLi commented 5 years ago

My reademption version is 0.4.3 and the 37_tex_non_rRNA_alignments_primary_aligner.sam do not exist.

Tillsa commented 5 years ago

Are your read files in FASTA or FASTQ format?

TarsLi commented 5 years ago

In the above command, my read file is in FASTA format.

Tillsa commented 5 years ago

could you please run the following 3 commands and post the output:

$ find READemption_analysis/

$ head READemption_analysis/input/reference/*

$ head READemption_analysis/input/annotations/*

TarsLi commented 5 years ago

lenovo@thinkstationp710:~/data/hcx/sortmeRNA/reademption$ find READemption_analysis/ READemption_analysis/ READemption_analysis/output READemption_analysis/output/viz_gene_quanti READemption_analysis/output/viz_align READemption_analysis/output/coverage READemption_analysis/output/coverage/coverage-raw READemption_analysis/output/coverage/coverage-tnoar_min_normalized READemption_analysis/output/coverage/coverage-tnoar_mil_normalized READemption_analysis/output/deseq READemption_analysis/output/deseq/deseq_with_annotations READemption_analysis/output/deseq/deseq_raw READemption_analysis/output/viz_deseq READemption_analysis/output/gene_quanti READemption_analysis/output/gene_quanti/gene_quanti_per_lib READemption_analysis/output/gene_quanti/gene_quanti_combined READemption_analysis/output/align READemption_analysis/output/align/index READemption_analysis/output/align/index/index.idx READemption_analysis/output/align/unaligned_reads READemption_analysis/output/align/reports_and_stats READemption_analysis/output/align/reports_and_stats/stats_data_json READemption_analysis/output/align/reports_and_stats/stats_data_json/read_processing.json READemption_analysis/output/align/reports_and_stats/versions_of_used_libraries.txt READemption_analysis/output/align/processed_reads READemption_analysis/output/align/processed_reads/o_tex_non_rRNA_processed.fa.gz READemption_analysis/output/align/processed_reads/37_tex_non_rRNA_processed.fa.gz READemption_analysis/output/align/alignments READemption_analysis/input READemption_analysis/input/reads READemption_analysis/input/reads/o_tex_non_rRNA.fa READemption_analysis/input/reads/37_tex_non_rRNA.fa READemption_analysis/input/reference_sequences READemption_analysis/input/reference_sequences/YF2_assembly_RAST.fa READemption_analysis/input/annotations READemption_analysis/input/annotations/YF2_assembly_RAST.gff

lenovo@thinkstationp710:~/data/hcx/sortmeRNA/reademption$ head READemption_analysis/input/reference/ head: cannot open 'READemption_analysis/input/reference/' for reading: No such file or directory lenovo@thinkstationp710:~/data/hcx/sortmeRNA/reademption$ head READemption_analysis/input/reference_sequences/*

fig|165179.420.peg.1 atgttagcaagtccaaaagccctctgggacaacagtcttttgctcataaaggacagtgta acagagcagcaatataacacatggttcaagccaatcgtctttgaatcgtacaagccgtcg acaaagactttgttggtgcaggttccgagtccgttcgtatacgagtacttggaacagaac ttcgttgacttgttaagtaaggtgctgcatcgtaattttggtgaaggaatccgtctcact tatcgtgttgtaaccgataaggagcataagctttctcaagatatagaggcagatccagac gatgctgatatggcaaagcaaactcgtgagcgtgcccagcagacggctgcccagcctgcc gctccccagcagcaggaagacattgatacacagttagacccgaagcttactttcaacaat tatatggagggtgacagcaataagctgcctcgttccgtaggattgtctattgccgagcat cccaataccacccagtttaacccaatgttcatttacggaccttcgggtagcggtaagacg

lenovo@thinkstationp710:~/data/hcx/sortmeRNA/reademption$ head READemption_analysis/input/annotations/*

gff-version 3

Chr1 FIG CDS 1 1416 . + 1 ID=fig|165179.420.peg.1;Name=Chromosomal replication initiator protein DnaA Chr1 FIG CDS 1565 2158 . - 2 ID=fig|165179.420.peg.2;Name=Predicted thiamin transporter PnuT Chr1 FIG CDS 2171 4531 . - 2 ID=fig|165179.420.peg.3;Name=Thiamin-regulated outer membrane receptor Omr1 Chr1 FIG CDS 4832 6160 . + 2 ID=fig|165179.420.peg.4;Name=Cell surface glycan-binding lipoprotein%2C utilization system for glycans and polysaccharides (PUL)%2C SusD family Chr1 FIG CDS 6285 6440 . - 0 ID=fig|165179.420.peg.5;Name=hypothetical protein Chr1 FIG CDS 6639 7790 . + 0 ID=fig|165179.420.peg.6;Name=hypothetical protein Chr1 FIG CDS 7856 8167 . + 2 ID=fig|165179.420.peg.7;Name=hypothetical protein Chr1 FIG CDS 8301 11834 . + 0 ID=fig|165179.420.peg.8;Name=hypothetical protein Chr1 FIG CDS 11857 12000 . + 1 ID=fig|165179.420.peg.9;Name=hypothetical protein

Tillsa commented 5 years ago

I might have found the problem: the sequence IDs of the FASTA files have to be the same as the ones in the first column of the GGF3 files. For further reading you can visit READemptions tutorial on performing an example anlysis: https://reademption.readthedocs.io/en/latest/example_analysis.html

Normally you can check the IDs of the FASTA files with:

$ grep ">" YF2_assembly_RAST.fa

But I think you FASTA file doesn't use the ">" symbol for indicating replicons, which it should. The first line of your FASTA file is: fig|165179.420.peg.1 So you could try: $ grep "|" YF2_assembly_RAST.fa to find all replicon IDs.

To find all the replicon IDs used in the gff annotation file you can use:

$ cut -f1 YF2_assembly_RAST.gff | sort | uniq

Then you need to adjust the replicon IDs in one of the files, either the FASTA or the gff files.

TarsLi commented 5 years ago

The sequence IDs of my FASTA files begin with the ">" symbol, I think there are something wrong when i copy the output of the above three commands because i am not familiar with the markdown. However, there may be something about the old pages example link: https://pythonhosted.org/READemption/example_analysis.html, i can not download the examples you offered in this pages. I will try the analysis example in the new link, thank you.

Tillsa commented 5 years ago

If you want to do the example analysis please go to this page https://reademption.readthedocs.io/en/latest/example_analysis.html. It is the newer version and the path to download the files is correct on this page. Did you do $ grep ">" YF2_assembly_RAST.fa to check the IDs of the reference sequence?

In the output you sent me it doesn't look like the lines containing the replicon IDs start with ">" (I highlighted the part I am referring to further down)

lenovo@thinkstationp710:/data/hcx/sortmeRNA/reademption$ head READemption_analysis/input/reference/ head: cannot open 'READemption_analysis/input/reference/' for reading: No such file or directory lenovo@thinkstationp710:/data/hcx/sortmeRNA/reademption$ head READemption_analysis/input/reference_sequences/*

fig|165179.420.peg.1 atgttagcaagtccaaaagccctctgggacaacagtcttttgctcataaaggacagtgta acagagcagcaatataacacatggttcaagccaatcgtctttgaatcgtacaagccgtcg acaaagactttgttggtgcaggttccgagtccgttcgtatacgagtacttggaacagaac ttcgttgacttgttaagtaaggtgctgcatcgtaattttggtgaaggaatccgtctcact tatcgtgttgtaaccgataaggagcataagctttctcaagatatagaggcagatccagac gatgctgatatggcaaagcaaactcgtgagcgtgcccagcagacggctgcccagcctgcc gctccccagcagcaggaagacattgatacacagttagacccgaagcttactttcaacaat tatatggagggtgacagcaataagctgcctcgttccgtaggattgtctattgccgagcat cccaataccacccagtttaacccaatgttcatttacggaccttcgggtagcggtaagacg

TarsLi commented 5 years ago

I am sure that the IDs of the reference sequence start with ">". $ grep ">" YF2_assembly_RAST.fa | head >fig|165179.420.peg.1 >fig|165179.420.peg.2 >fig|165179.420.peg.3 >fig|165179.420.peg.4 >fig|165179.420.peg.5 >fig|165179.420.peg.6 >fig|165179.420.peg.7 >fig|165179.420.peg.8 >fig|165179.420.peg.9 >fig|165179.420.peg.10

$ grep "|" YF2_assembly_RAST.fa |head >fig|165179.420.peg.1 >fig|165179.420.peg.2 >fig|165179.420.peg.3 >fig|165179.420.peg.4 >fig|165179.420.peg.5 >fig|165179.420.peg.6 >fig|165179.420.peg.7 >fig|165179.420.peg.8 >fig|165179.420.peg.9 >fig|165179.420.peg.10

Tillsa commented 5 years ago

Ok that's good. But it doesn't look like they are similar to the IDs used in the first column of the annotation file. You need to adjust them in order to use READemption.

TarsLi commented 5 years ago

OK, I will modify my file format according to the sample file. Thanks for your patience!

Tillsa commented 5 years ago

You are welcome. Please let us know if it worked, so we can close the issue.

Tillsa commented 5 years ago

It just came to my mind, that different IDs in the gff and reference genome only pose a problem during the gene quantification step but not during the alignment, which you are trying right now. Never the less adjusting the IDs will help you later on for the gene quantifiation step. Did you try the example analysis? If it worked we can rule out that there is a problem with your setup and have a closer look at the input files.