Closed Rdebar closed 1 year ago
Hi!
you can use conda to install segemehl 0.3.4 (which is required for READemption version 0.6.0):
conda install -c bioconda segemehl
And make sure you add segemehl or rather the conda bin folder to your path. You can do so by adding a line similar to this one
export PATH="$HOME/anaconda/bin:$PATH"
to your bash profile file like in the answer of this stack overflow question:
https://stackoverflow.com/questions/35076536/i-have-to-type-export-path-anaconda-binpath-everytime-i-rerun-the-terminal
Best regards,
Till
Thanks a lot for the response. Now I downloaded the right version, but still I'm trying to do the following instructions for the installation after decompressing:
cd segemehl_*/segemehl/ && make && cd ../../
and it says that there is no such file or directory. Would it be different for the 0.3.4 version of segemehl?
Best, Rubén
After installing segemehl via conda you don't need to to use the make command. However, if you want to download and built segemehl from https://www.bioinf.uni-leipzig.de/Software/segemehl/ you need to make sure you are in the folder where segemehl's make file is located. Best, Till
Thanks a million for your help! Best, Rubén
Please run
$ reademption --version
and let me know which version you are using.
READemption version 0.6.0
Ok, that is the right version. Can you please post the output of
find READemption_analysis
pvc@DamienPC:~/Escritorio/Ruben$ find READemption_analysis READemption_analysis READemption_analysis/input READemption_analysis/input/reference_sequences READemption_analysis/input/reference_sequences/NC_017718.fa READemption_analysis/input/reference_sequences/NC_017720.fa READemption_analysis/input/reference_sequences/NC_017719.fa READemption_analysis/input/reference_sequences/NC_016810.fa READemption_analysis/input/annotations READemption_analysis/input/annotations/NC_017718.gff READemption_analysis/input/annotations/NC_017720.gff READemption_analysis/input/annotations/NC_017719.gff READemption_analysis/input/annotations/NC_016810.gff READemption_analysis/input/reads READemption_analysis/input/reads/InSPI2_R1.fa.bz2 READemption_analysis/input/reads/InSPI2_R2.fa.bz2 READemption_analysis/input/reads/LSP_R1.fa.bz2 READemption_analysis/input/reads/LSP_R2.fa.bz2 READemption_analysis/output READemption_analysis/output/align READemption_analysis/output/align/unaligned_reads READemption_analysis/output/align/processed_reads READemption_analysis/output/align/processed_reads/InSPI2_R1_processed.fa.gz READemption_analysis/output/align/processed_reads/LSP_R1_processed.fa.gz READemption_analysis/output/align/processed_reads/InSPI2_R2_processed.fa.gz READemption_analysis/output/align/processed_reads/LSP_R2_processed.fa.gz READemption_analysis/output/align/alignments READemption_analysis/output/align/index READemption_analysis/output/align/index/index.idx READemption_analysis/output/align/reports_and_stats READemption_analysis/output/align/reports_and_stats/version_log.txt READemption_analysis/output/align/reports_and_stats/stats_data_json READemption_analysis/output/align/reports_and_stats/stats_data_json/read_processing.json READemption_analysis/output/viz_gene_quanti READemption_analysis/output/viz_align READemption_analysis/output/coverage READemption_analysis/output/coverage/coverage-tnoar_mil_normalized READemption_analysis/output/coverage/coverage-raw READemption_analysis/output/coverage/coverage-tnoar_min_normalized READemption_analysis/output/deseq READemption_analysis/output/deseq/deseq_raw READemption_analysis/output/deseq/deseq_with_annotations READemption_analysis/output/viz_deseq READemption_analysis/output/gene_quanti READemption_analysis/output/gene_quanti/gene_quanti_per_lib READemption_analysis/output/gene_quanti/gene_quanti_combined
I just ran the tutorial on my machine and had no problems. I used a bash script to do so, which I uploaded run_reademption_tutorial.sh.
Could you please use the script to run the example analysis. You need to change line 2 and line 3 where it says:
readonly READEMPTION=/home/till/Documents/READemption_developing/0.6.0/READemption/bin/reademption
and
readonly READEMPTION_ANALYSIS_FOLDER=READemption_analysis
according to your system.
Does the error persist?
Ok this is awkward for me already... I'm running the script substituting line 2 with
readonly READEMPTION=/home/pvc/Escritorio/Ruben/
which is where I'm doing the analysis, but it reports a warning for line 20 saying that the file or directory does not exist. Does it happen to refer to the reademption script? Because I cannot find it in that directory after the installation.
Thanks a lot for your patience. Rubén
I think the right line in your case would be similar to this one: readonly READEMPTION=/home/pvc/Escritorio/Ruben/READemption/bin/reademption
It reports the same in both cases. Might I have done something wrong in the installation?
I don't think you have done something wrong during installation. But you could post all the commands you ran for installing and I have a look and let you know if something is missing.
Maybe the following issue, where someone had a similar error helps you. https://github.com/PacificBiosciences/FALCON_unzip/issues/48 "I have had this exact same problem recently. It turned out to be that the system limits the number of opened files. After increasing the limit, the error is gone."
Best, Till
There might be something I missed. After updating segemehl, I tried to run these lines of the installation dating them to the new version name (segemehl-0.3.4):
sudo cp segemehl_0_2_0/segemehl/segemehl.x /usr/bin/segemehl.x sudo cp segemehl_0_2_0/segemehl/lack.x /usr/bin/lack.x
But it did not work (it says that the directory or item does not exist). Could that be the problem? As I have checked, it is the only different thing that I have with respect to the instructions.
Thanks a lot for your patience. Best, Rubén
You don't need to the two lines you mentioned after installing segemehl via conda. I guess it's a problem of your system. Maybe try to install READemption on a different system or use a VM. And I still recommend you to have a look at the issue (regarding the limit of open files) I linked above. Best, Till
Finally I fixed the readonly thing, found the right location of everything, updated everythng and tried to run the example analysis from the script. At the beginning I got this:
pvc@DamienPC:~/Escritorio/Ruben$ bash run_reademption_tutorial.sh run_reademption_tutorial.sh: línea 5: create_project: order not found run_reademption_tutorial.sh: línea 6: store_environment_variable: order not found run_reademption_tutorial.sh: línea 7: download_fasta: order not found run_reademption_tutorial.sh: línea 8: modify_fasta_header: order not found run_reademption_tutorial.sh: línea 9: download_annotation: order not found run_reademption_tutorial.sh: línea 10: download_reads: order not found
So I deleted those orders from the script and created the analysis file myself. Then I got this, which is the same I get when running it manually:
pvc@DamienPC:~/Escritorio/Ruben$ bash run_reademption_tutorial.sh
[E::hts_open_format] Failed to open file "READemption_analysis/output/align/alignments/InSPI2_R1_alignments_final.bam" : No such file or directory
Traceback (most recent call last):
File "/usr/local/bin/reademption", line 320, in
In the beginning, I don't know if it should be a problem that it does not find and open an output file (failed to open "READemption_analysis/output/align/alignments/InSPI2_R1_alignments_final.bam" for reading: No such file or directory) that should be created when running it, does that make sense? In the reast of the errors, it seems like it fails to find afew things within the project directory. Might there be any library or accesory that I might be missing? I have checked the limit for the open files and it is 4096, I don't know if that may be enough to run it. About the VM, I tried to run READemption in one some time ago and it could not handle the processing, tha's why I was trying it on a Linux computer.
Thanks a million for your patience. Best, Rubén
@Tillsa Hi
I am facing same issue with my data files but example data set is working fine. I have reproduce things according to the example data set instructions but still getting the error: pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=None, stderr=samtools view: failed to open ".//output/align/alignments/SRR10957259_alignments_final.bam" for reading: No such file or directory\n'
My terminal process is as follows:
(base) dverma2@crb-5zx8nd2:~/Desktop/readmpt_fn$ reademption align -q -g -f ./
[SEGEMEHL] Thu Jul 29 14:22:29 2021: reading database sequences.
[SEGEMEHL] Thu Jul 29 14:22:49 2021: 61 database sequences found.
[SEGEMEHL] Thu Jul 29 14:22:49 2021: total length of db sequences: 2728222451
[SEGEMEHL] Thu Jul 29 14:22:49 2021: assigning all reads to default read group 'A1'.
[SEGEMEHL] Thu Jul 29 14:22:49 2021: additional read group default values ' SM:sample1 LB:library1 PU:unit1 PL:illumina'
[SEGEMEHL] Thu Jul 29 14:22:49 2021: reads assigned to read group 'A1'
[SEGEMEHL] Thu Jul 29 14:22:49 2021: compiled sam header.
[SEGEMEHL] Thu Jul 29 14:22:54 2021: reading queries in './/output/align/processed_reads/SRR10957259_processed.fa.gz'.
[SEGEMEHL] Thu Jul 29 14:24:19 2021: 40210880 query sequences found.
[SEGEMEHL] Thu Jul 29 14:24:19 2021: reading database sequences.
[SEGEMEHL] Thu Jul 29 14:24:40 2021: 61 database sequences found.
[SEGEMEHL] Thu Jul 29 14:24:40 2021: total length of db sequences: 2728222451
[SEGEMEHL] Thu Jul 29 14:24:40 2021: assigning all reads to default read group 'A1'.
[SEGEMEHL] Thu Jul 29 14:24:40 2021: additional read group default values ' SM:sample1 LB:library1 PU:unit1 PL:illumina'
[SEGEMEHL] Thu Jul 29 14:24:40 2021: reads assigned to read group 'A1'
[SEGEMEHL] Thu Jul 29 14:24:40 2021: compiled sam header.
[SEGEMEHL] Thu Jul 29 14:24:40 2021: reading suffix array './/output/align/index/index.idx' from disk.
[E::hts_open_format] Failed to open file ".//output/align/alignments/SRR10957259_alignments_final.bam" : No such file or directory
Traceback (most recent call last):
File "/home/WIN/dverma2/anaconda3/bin/reademption", line 315, in
Version for supporting tools:
Program: samtools (Tools for alignments in the SAM format) Version: 1.10 (using htslib 1.13)
READemption version: 1.0.1 Python version: 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 10.3.0] Biopython version: 1.79 pysam version: 0.16.0.1 matplotlib version: 3.3.0 pandas version: 1.3.1 segemehl-0.3.4
log of read_processing.json { "SRR10957259": { "total_no_of_reads": 40210880, "polya_removed": 0, "single_a_removed": 0, "unmodified": 40210880, "too_short": 0, "long_enough": 40210880, "read_length_before_processing_and_freq": { "76": 40210880 }, "read_length_after_processing_and_freq": { "76": 40210880 } } }
dverma2@crb-5zx8nd2:~/Desktop/readmpt_fn/output/align/processed_reads$ head SRR10957259_processed.fa
SRR10957259.1 1 length=76 GGCAANTCTCAGACAGCAGGGCTTCTACTGGTCTTTCAGATCCTTCAGTCTTCTNNTGGCAGACTTCANTGTGACT SRR10957259.2 2 length=76 GTCAGNAGCACGACTTGATCTTCGGGGGCAATGCCTTCCAGGGAGGCCACATGANNTTTGATCTGGGCNACCGTCT SRR10957259.3 3 length=76 GTCCGNAGTACACAATTTCCCCGGATGACTTCTTCATCTTCTTCAGCTGTGACANNAAGTACCAGAAGNGGGACTT SRR10957259.4 4 length=76 CTCAGNTTCCAGTTCTTGCTTCATCTTGGCAAACTCTTCTTTTGTCATAGATCCTNCCCCTTCTCCCAGTTTCTGC SRR10957259.5 5 length=76 CCAGCNGCAAGATTAACGCAACCTTCGAGCTTCTCTTTCTGACTCCAATAGGGTGNGCACGTCACCCTCTCGAACG
Hi @deppworld,
I just saw that your align command uses the option "-q" which is only required for FASTQ files. Your reads are in FASTA format. If you remove the option "-q" from your command it should work.
Best wishes,
Till
Hi Till I tried with both the ways(with or without -q) but still getting same error. This is the mouse RNA-seq data and files are not that much big for any hardware issue. Please see and help me to troubleshoot it. I thought it may be a samtools error but samtools is working fine.
(base) dverma2@crb-5zx8nd2:~/Desktop$ reademption align -f readmpt_fn/
[E::hts_open_format] Failed to open file "readmpt_fn//output/align/alignments/SRR10957259_alignments_final.bam" : No such file or directory
Traceback (most recent call last):
File "/home/WIN/dverma2/anaconda3/bin/reademption", line 315, in
Can you please try:
$ reademption align -f readmpt_fn
That is without the "/" at the end of the project path
Did you try the tutorial? If that worked well, we can rule out problems due to your system and the installed packages. Another thing that I noticed, is that your read files in FASTA don't have ">" as headers. You could create a very small dummy file with 10 reads and add ">" in front of each header line and see if that might solve the problem.
Hi Tillsa Thanks for your prompt reply. I have followed your both the above mentioned suggestion ; this time I created a demo file in FASTA file and did the same. but still getting same error. I have also checked samtools by using separately. Still I could not rectify the bam file error. I have compared with test result folder and found no index file and no bam file was there.
eadmpt_fn/ ├── input │ ├── annotations │ │ └── SRR10957259.gff3 │ ├── reads │ │ └── SRR10957259.fa │ └── reference_sequences │ └── SRR10957259.fa └── output ├── align │ ├── alignments │ ├── index │ ├── processed_reads │ │ └── SRR10957259_processed.fa.gz │ ├── reports_and_stats │ │ ├── stats_data_json │ │ │ └── read_processing.json │ │ └── version_log.txt │ └── unaligned_reads │ └── SRR10957259_unaligned.fa ├── coverage │ ├── coverage-raw │ ├── coverage-tnoar_mil_normalized │ └── coverage-tnoar_min_normalized ├── deseq │ ├── deseq_raw │ └── deseq_with_annotations ├── gene_quanti │ ├── gene_quanti_combined │ └── gene_quanti_per_lib ├── viz_align ├── viz_deseq └── viz_gene_quanti
Please suggest
Thanks
Did you try the tutorial?
Yes, I have done with the tutorial and test samples and it was working fine but this error is coming when I am using my samples. Would it be possible if you can check this on your system with SRR10957259?
READemption_analysis/ ├── input │ ├── annotations │ │ ├── NC_016810.gff │ │ ├── NC_017718.gff │ │ ├── NC_017719.gff │ │ └── NC_017720.gff │ ├── reads │ │ ├── InSPI2_R1.fa.bz2 │ │ ├── InSPI2_R2.fa.bz2 │ │ ├── LSP_R1.fa.bz2 │ │ └── LSP_R2.fa.bz2 │ └── reference_sequences │ ├── NC_016810.fa │ ├── NC_017718.fa │ ├── NC_017719.fa │ └── NC_017720.fa └── output ├── align │ ├── alignments │ │ ├── InSPI2_R1_alignments_final.bam │ │ ├── InSPI2_R1_alignments_final.bam.bai │ │ ├── InSPI2_R2_alignments_final.bam │ │ ├── InSPI2_R2_alignments_final.bam.bai │ │ ├── LSP_R1_alignments_final.bam │ │ ├── LSP_R1_alignments_final.bam.bai │ │ ├── LSP_R2_alignments_final.bam │ │ └── LSP_R2_alignments_final.bam.bai │ ├── index │ │ └── index.idx │ ├── processed_reads │ │ ├── InSPI2_R1_processed.fa.gz │ │ ├── InSPI2_R2_processed.fa.gz │ │ ├── LSP_R1_processed.fa.gz │ │ └── LSP_R2_processed.fa.gz │ ├── reports_and_stats │ │ ├── read_alignment_stats.csv │ │ ├── stats_data_json │ │ │ ├── read_alignments_final.json │ │ │ └── read_processing.json │ │ └── version_log.txt │ └── unaligned_reads │ ├── InSPI2_R1_unaligned.fa │ ├── InSPI2_R2_unaligned.fa │ ├── LSP_R1_unaligned.fa │ └── LSP_R2_unaligned.fa ├── coverage │ ├── coverage-raw │ ├── coverage-tnoar_mil_normalized │ └── coverage-tnoar_min_normalized ├── deseq │ ├── deseq_raw │ └── deseq_with_annotations ├── gene_quanti │ ├── gene_quanti_combined │ └── gene_quanti_per_lib ├── viz_align ├── viz_deseq └── viz_gene_quanti
In my case,I am not getting bam and index file.
Thanks Deepak
Did you try the tutorial? If that worked well, we can rule out problems due to your system and the installed packages. Another thing that I noticed, is that your read files in FASTA don't have ">" as headers. You could create a very small dummy file with 10 reads and add ">" in front of each header line and see if that might solve the problem.
And did you fix the fasta headers?
Yes (base) dverma2@crb-5zx8nd2:~/Desktop/readmpt_fn/input/reads$ head SRR10957259.fa
SRR10957259.1 1 length=76 GGCAANTCTCAGACAGCAGGGCTTCTACTGGTCTTTCAGATCCTTCAGTCTTCTNNTGGCAGACTTCANTGTGACT SRR10957259.2 2 length=76 GTCAGNAGCACGACTTGATCTTCGGGGGCAATGCCTTCCAGGGAGGCCACATGANNTTTGATCTGGGCNACCGTCT SRR10957259.3 3 length=76 GTCCGNAGTACACAATTTCCCCGGATGACTTCTTCATCTTCTTCAGCTGTGACANNAAGTACCAGAAGNGGGACTT SRR10957259.4 4 length=76 CTCAGNTTCCAGTTCTTGCTTCATCTTGGCAAACTCTTCTTTTGTCATAGATCCTNCCCCTTCTCCCAGTTTCTGC SRR10957259.5 5 length=76 CCAGCNGCAAGATTAACGCAACCTTCGAGCTTCTCTTTCTGACTCCAATAGGGTGNGCACGTCACCCTCTCGAACG
I am thinking to reinstall the package and rerun the pipeline. Hope It may resolve the issue.
You need to add ">" in front of each header line to get a Fasta file that looks like this:
>SRR10957259.1 1 length=76
GGCAANTCTCAGACAGCAGGGCTTCTACTGGTCTTTCAGATCCTTCAGTCTTCTNNTGGCAGACTTCANTGTGACT
>SRR10957259.2 2 length=76
GTCAGNAGCACGACTTGATCTTCGGGGGCAATGCCTTCCAGGGAGGCCACATGANNTTTGATCTGGGCNACCGTCT
>SRR10957259.3 3 length=76
GTCCGNAGTACACAATTTCCCCGGATGACTTCTTCATCTTCTTCAGCTGTGACANNAAGTACCAGAAGNGGGACTT
>SRR10957259.4 4 length=76
CTCAGNTTCCAGTTCTTGCTTCATCTTGGCAAACTCTTCTTTTGTCATAGATCCTNCCCCTTCTCCCAGTTTCTGC
>SRR10957259.5 5 length=76
CCAGCNGCAAGATTAACGCAACCTTCGAGCTTCTCTTTCTGACTCCAATAGGGTGNGCACGTCACCCTCTCGAACG
Ya this is there but due to webpage settings it is disappeared while I am copying it here. Can you tell me what is the minimum hardware configuration is required for this pipeline like I am using 8 cores and 32 GB RAM system. This is the complete file, I used your tutorial command to create it.
SRR10957259.1 1 length=76 GGCAANTCTCAGACAGCAGGGCTTCTACTGGTCTTTCAGATCCTTCAGTCTTCTNNTGGCAGACTTCANTGTGACT SRR10957259.2 2 length=76 GTCAGNAGCACGACTTGATCTTCGGGGGCAATGCCTTCCAGGGAGGCCACATGANNTTTGATCTGGGCNACCGTCT SRR10957259.3 3 length=76 GTCCGNAGTACACAATTTCCCCGGATGACTTCTTCATCTTCTTCAGCTGTGACANNAAGTACCAGAAGNGGGACTT SRR10957259.4 4 length=76 CTCAGNTTCCAGTTCTTGCTTCATCTTGGCAAACTCTTCTTTTGTCATAGATCCTNCCCCTTCTCCCAGTTTCTGC SRR10957259.5 5 length=76 CCAGCNGCAAGATTAACGCAACCTTCGAGCTTCTCTTTCTGACTCCAATAGGGTGNGCACGTCACCCTCTCGAACG SRR10957259.6 6 length=76 ATATGNGCATCTCCAGTCTCCACTGTCAACTGTGAGTTGATGGCCTCAAAGCTGGNGTTCTCCAATAGCTTCATGT SRR10957259.7 7 length=76 GCCACNCTGGCACATGAATCCTGGAATAATTCTGTGAAAGGAGGAACCCTTATAGCCAAATCCTTTCTCTCCAGTG SRR10957259.8 8 length=76 CTCTTNTCCAAGTGCAGTGCACACTCCATTGCATTCAGCCCGCTCTCCCAGTCATCACGGTCTGGTTTCTTTATAT SRR10957259.9 9 length=76 CGGGAATGGACAGTCACAGGCTTGCGGATGATCAGCCCATCCTTGATCAGCTTCCTGATCTGCTGACGGGAGTTGG SRR10957259.10 10 length=76 GTGCTTAATCTGCTCTGCAGCTCCAGTCATAAAAGGCTTTACTCTTTCTGGTTTCTGCTCTTCAAGTTTGCCTTTG
8 Cores and 32 GB RAM is fine. If the tutorial doesn't raise an error but running READemption with your input files (reads and reference sequences in FASTA) raises an error, I assume that your input files are corrupted or don't meet the FASTA specifications. You could try to validate them here: https://plabipd.de/portal/mercator-fasta-validator
Hey - I wanted to report that I'm getting the SamtoolsError( pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=None, stderr=samtools view: failed to open "READemption_analysis/output/align/alignments/file.bam" for reading: No such file or directory\n'
using the tutorial data and the latest docker image. I haven't solved it yet. I notice the installed version of segemehl in the image is 0.2.0-418.
Hello @kmuench,
Could you please provide the stdout messages of the entire analysis? Did you run the tutorial from https://github.com/Tillsa/READemption_Docker_Tutorial via $ bash run_tutorial.sh all
?
Hi everyone! I'm trying to run READemption for the first time on Ubuntu with the example data (also, I'm not an expert in bioinformatics) and I'm getting this error:
[E::hts_open_format] Failed to open file "READemption_analysis/output/align/alignments/InSPI2_R1_alignments_final.bam" : No such file or directory Traceback (most recent call last): File "/usr/local/bin/reademption", line 320, in
main()
File "/usr/local/bin/reademption", line 284, in main
args.func(controller)
File "/usr/local/bin/reademption", line 294, in align_reads
controller.align_reads()
File "/home/pvc/.local/lib/python3.6/site-packages/reademptionlib/controller.py", line 81, in align_reads
self._align_single_end_reads()
File "/home/pvc/.local/lib/python3.6/site-packages/reademptionlib/controller.py", line 328, in _align_single_end_reads
paired_end=False)
File "/home/pvc/.local/lib/python3.6/site-packages/reademptionlib/readaligner.py", line 20, in run_alignment
paired_end=paired_end)
File "/home/pvc/.local/lib/python3.6/site-packages/reademptionlib/segemehl.py", line 75, in align_reads
catch_stdout=False)
File "/home/pvc/.local/lib/python3.6/site-packages/pysam/utils.py", line 75, in call
stderr))
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=None, stderr=samtools view: failed to open "READemption_analysis/output/align/alignments/InSPI2_R1_alignments_final.bam" for reading: No such file or directory\n'
Following the advise that other people received, I'm trying to install segemehl to 0.3.4, but I don't know how to do it just by following the instructions provided here: https://reademption.readthedocs.io/en/latest/installation.html I have seen that the files obtained from unzipping it are different to 0.2.0 version. Maybe that's the reason I cannot install it following the instructions I mentioned.
Please, any advise will be more than welcome. Thanks!!