a2iEditing / RNAEditingIndexer

A tool for the calculation of RNA-editing index for RNA seq data
Other
35 stars 18 forks source link

Test data is failing on current Docker image #8

Closed mjsteinbaugh closed 4 years ago

mjsteinbaugh commented 4 years ago

Hi, I'm interested in testing out RNAEditingIndexer, but I'm running into configuration issues with the current recommended Docker setup. Here's an attempt at a reprex.

My Docker image is built using the current Dockerfile from this git repo.

image="acidgenomics/rnaeditingindexer"
workdir="/work"
docker pull "$image"
docker run -it \
    --volume="${PWD}:${workdir}" \
    --workdir="$workdir" \
    "$image" \
    bash

The configuration checks out:

cd /bin/AEI/RNAEditingIndexer
./configure.sh

(PS note that there seems to be a bug in the config script, returning "No such file or directory" currently)

./configure.sh: line 9: :
This work is licensed under the Creative Commons Attribution-Non-Commercial-ShareAlike 4.0 International License.
To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/.
For use of the software by commercial entities, please inquire with Tel Aviv University at ramot@ramot.org.
© 2019 Tel Aviv University (Erez Y. Levanon, Erez.Levanon@biu.ac.il;
Eli Eisenberg, elieis@post.tau.ac.il;
Shalom Hillel Roth, shalomhillel.roth@live.biu.ac.il).
: No such file or directory
BEDTools Path Executable Test - Succeeded
BEDTools Version Test - Succeeded
SAMTools Path Executable Test - Succeeded
SAMTools Version Test - Succeeded
BAM Utils Path Executable Test - Succeeded
BAM Utils Run Test - Succeeded
BAM Utils Version Test - Succeeded
Java Path Executable Test - Succeeded
Java Run Test - Succeeded (java.version = 1.8.0)
Python 2.7 Path Executable Test - Succeeded
Python 2.7 Version Test - Succeeded

I only want to analyze samples against hg38, and the Docker config looks good, as far as I can tell:

less /bin/AEI/RNAEditingIndexer/src/RNAEditingIndex/Configs/ResourcesPaths.ini
[hg38]
Genome =  /bin/AEI/RNAEditingIndexer/Resources/Genomes/HomoSapiens/ucscHg38Genome.fa
RERegions = /bin/AEI/RNAEditingIndexer/Resources/Regions/HomoSapiens/ucscHg38Alu.bed.gz
SNPs = /bin/AEI/RNAEditingIndexer/Resources/SNPs/HomoSapiens/ucscHg38CommonGenomicSNPs150.bed.gz
RefSeq = /bin/AEI/RNAEditingIndexer/Resources/RefSeqAnnotations/HomoSapiens/ucscHg38RefSeqCurated.bed.gz
GenesExpression = /bin/AEI/RNAEditingIndexer/Resources/GenesExpression/HomoSapiens/ucscHg38GTExGeneExpression.bed.gz

Here's what I'm seeing for the downloaded genome annotations:

ls -ahl /bin/AEI/RNAEditingIndexer/Resources/*/HomoSapiens/*
-rw-r--r--. 1 root root 4.4M Oct 25 14:48 /bin/AEI/RNAEditingIndexer/Resources/GenesExpression/HomoSapiens/ucscHg19GTExGeneExpression.bed.gz
-rw-r--r--. 1 root root 4.4M Oct 25 14:45 /bin/AEI/RNAEditingIndexer/Resources/GenesExpression/HomoSapiens/ucscHg38GTExGeneExpression.bed.gz
-rw-r--r--. 1 root root 3.0G Oct 25 14:46 /bin/AEI/RNAEditingIndexer/Resources/Genomes/HomoSapiens/ucscHg19Genome.fa
-rw-r--r--. 1 root root 3.1G Oct 25 14:44 /bin/AEI/RNAEditingIndexer/Resources/Genomes/HomoSapiens/ucscHg38Genome.fa
-rw-r--r--. 1 root root 2.8M Oct 25 14:48 /bin/AEI/RNAEditingIndexer/Resources/RefSeqAnnotations/HomoSapiens/ucscHg19RefSeqCurated.bed.gz
-rw-r--r--. 1 root root 3.1M Oct 25 14:45 /bin/AEI/RNAEditingIndexer/Resources/RefSeqAnnotations/HomoSapiens/ucscHg38RefSeqCurated.bed.gz
-rw-r--r--. 1 root root 8.1M Oct 25 14:47 /bin/AEI/RNAEditingIndexer/Resources/Regions/HomoSapiens/ucscHg19Alu.bed.gz
-rw-r--r--. 1 root root 8.1M Oct 25 14:44 /bin/AEI/RNAEditingIndexer/Resources/Regions/HomoSapiens/ucscHg38Alu.bed.gz
-rw-r--r--. 1 root root 139M Oct 25 14:48 /bin/AEI/RNAEditingIndexer/Resources/SNPs/HomoSapiens/ucscHg19CommonGenomicSNPs150.bed.gz
-rw-r--r--. 1 root root 143M Oct 25 14:45 /bin/AEI/RNAEditingIndexer/Resources/SNPs/HomoSapiens/ucscHg38CommonGenomicSNPs150.bed.gz

This minimal example currently fails:

RNAEditingIndex \
    -d "/bin/AEI/RNAEditingIndexer/TestResources/BAMs" \
    -f ".bam" \
    --genome hg38 \
    --verbose

Here's the error log:

Error: Unable to open file /bin/AEI/RNAEditingIndexer/TestResources/BAMs/BAMs/SRR5962201/SRR5962201_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only.bam. Exiting.
[E::hts_open_format] Failed to open file ./BAMs/SRR5962201/SRR5962201_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only/SRR5962201_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only_region_ucscHg38Alu.bed.gz_alignments.bam
samtools sort: failed to create "./BAMs/SRR5962201/SRR5962201_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only/SRR5962201_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only_region_ucscHg38Alu.bed.gz_alignments.bam": No such file or directory
[2019-11-04 21:04:49,509] EIPipelineManger ERROR    Process: Editing_Index_Pipline_SRR5962201_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only; Going To Error Step: Step_1_backup
Error: Unable to open file /bin/AEI/RNAEditingIndexer/TestResources/BAMs/BAMs/SRR5962209/SRR5962209_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only.bam. Exiting.
[E::hts_open_format] Failed to open file ./BAMs/SRR5962209/SRR5962209_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only/SRR5962209_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only_region_ucscHg38Alu.bed.gz_alignments.bam
samtools sort: failed to create "./BAMs/SRR5962209/SRR5962209_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only/SRR5962209_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only_region_ucscHg38Alu.bed.gz_alignments.bam": No such file or directory
[2019-11-04 21:04:49,528] EIPipelineManger ERROR    Process: Editing_Index_Pipline_SRR5962209_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only; Going To Error Step: Step_1_backup
Error: Unable to open file /bin/AEI/RNAEditingIndexer/TestResources/BAMs/BAMs/SRR5962217/SRR5962217_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only.bam. Exiting.
[E::hts_open_format] Failed to open file ./BAMs/SRR5962217/SRR5962217_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only/SRR5962217_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only_region_ucscHg38Alu.bed.gz_alignments.bam
samtools sort: failed to create "./BAMs/SRR5962217/SRR5962217_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only/SRR5962217_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only_region_ucscHg38Alu.bed.gz_alignments.bam": No such file or directory
[2019-11-04 21:04:49,549] EIPipelineManger ERROR    Process: Editing_Index_Pipline_SRR5962217_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only; Going To Error Step: Step_1_backup
Error: Unable to open file /bin/AEI/RNAEditingIndexer/TestResources/BAMs/BAMs/SRR5962219/SRR5962219_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only.bam. Exiting.
[E::hts_open_format] Failed to open file ./BAMs/SRR5962219/SRR5962219_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only/SRR5962219_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only_region_ucscHg38Alu.bed.gz_alignments.bam
samtools sort: failed to create "./BAMs/SRR5962219/SRR5962219_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only/SRR5962219_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only_region_ucscHg38Alu.bed.gz_alignments.bam": No such file or directory
[2019-11-04 21:04:49,571] EIPipelineManger ERROR    Process: Editing_Index_Pipline_SRR5962219_sampled_with_0.1.Aligned.sortedByCoord.out.AluChr1Only; Going To Error Step: Step_1_backup
[2019-11-04 21:04:49,577] general_functions WARNING  GGPSResources.general_functions.remove_files Failed To Remove 04-11-2019-21.cnf
[2019-11-04 21:04:49,577] general_functions WARNING  GGPSResources.general_functions.remove_files Failed To Remove 04-11-2019-21.cnf

Best, Mike

mjsteinbaugh commented 4 years ago

EditingIndex.2019-11-04T21:09:26.862146.log

mjsteinbaugh commented 4 years ago

OK rebuilding the Docker image to incorporate the changes on master (2019-11-04) is helping, but I'm still seeing some warnings and other issues when attempting to process hg38 BAM files:

RNAEditingIndex -d "/work/test" -f ".bam" --genome hg38 --verbose
***** WARNING: File /work/test/1-EV-DMSO-A.bam has inconsistent naming convention for record:
1       10541   10633   GWNJ-0901:501:GW1908232415th:7:2223:23967:13721/2       1       +

[E::hts_open_format] Failed to open file ./1-EV-DMSO-A/1-EV-DMSO-A_region_ucscHg38Alu.bed.gz_alignments                        .bam
samtools sort: failed to create "./1-EV-DMSO-A/1-EV-DMSO-A_region_ucscHg38Alu.bed.gz_alignments.bam": N                        o such file or directory
***** WARNING: File /work/test/1-EV-DMSO-A.bam has inconsistent naming convention for record:
1       10541   10633   GWNJ-0901:501:GW1908232415th:7:2223:23967:13721/2       1       +

[2019-11-05 16:52:53,607] EIPipelineManger ERROR    Process: Editing_Index_Pipline_1-EV-DMSO-A; Going T                        o Error Step: Step_1_backup
[2019-11-05 16:52:53,609] general_functions WARNING  GGPSResources.general_functions.remove_files Faile                        d To Remove 05-11-2019-16.cnf
[2019-11-05 16:52:53,609] general_functions WARNING  GGPSResources.general_functions.remove_files Faile                        d To Remove 05-11-2019-16.cnf
[2019-11-05 16:55:44,412] A2IEditingIndex ERROR    Failed Loading Coverage Data of 1-EV-DMSO-A! (Won't Delete cmpileup)
Traceback (most recent call last):
  File "Tools/EditingIndex/A2IEditingIndex.py", line 856, in get_strands_and_counts
  File "Tools/EditingIndex/DataConverters/CountPileupCoverter.py", line 69, in parse_count_pileup
IOError: [Errno 2] No such file or directory: './1-EV-DMSO-A/1-EV-DMSO-A_ucscHg38Alu.bed.gz_mpileup.cmpileup'
[2019-11-05 16:55:44,723] general_functions WARNING  GGPSResources.general_functions.remove_files Failed To Remove ./ucscHg38Genome.fa.GenomeIndex.jsd
[2019-11-05 16:55:44,723] general_functions WARNING  GGPSResources.general_functions.remove_files Failed To Remove ./1-EV-DMSO-A/1-EV-DMSO-A_ucscHg38Alu.bed.gz_mpileup.cmpileup
shalomhillelroth commented 4 years ago

Hi,

Can you please double check that you ran it with exactly "-f _sampled_with_0.1.Aligned.sortedByCoord.out.bam.AluChr1Only.bam" as stated in the test instructions?

From your log I suspect that this parameter had a different value (Please kindly note, as specified in the documentation, that this parameter determines also the sample name, see section 5.1.1.1.1.1 of the documentation)

Only the best, Shalom Hillel Roth

mjsteinbaugh commented 4 years ago

This still isn't running clean for me, using the latest Docker image:

RNAEditingIndex \
    -d "/bin/AEI/RNAEditingIndexer/TestResources/BAMs" \
    -f "_sampled_with_0.1.Aligned.sortedByCoord.out.bam.AluChr1Only.bam" \
    --genome hg38 \
    --verbose
[bam_sort_core] merging from 0 files and 10 in-memory blocks...
[E::hts_open_format] Failed to open file ./BAMs/SRR5962209/SRR5962209/SRR5962209_region_ucscHg38Alu.bed.gz_alignments.bam
samtools sort: failed to create "./BAMs/SRR5962209/SRR5962209/SRR5962209_region_ucscHg38Alu.bed.gz_alignments.bam": No such file or directory
[bam_sort_core] merging from 0 files and 10 in-memory blocks...
[E::hts_open_format] Failed to open file ./BAMs/SRR5962201/SRR5962201/SRR5962201_region_ucscHg38Alu.bed.gz_alignments.bam
samtools sort: failed to create "./BAMs/SRR5962201/SRR5962201/SRR5962201_region_ucscHg38Alu.bed.gz_alignments.bam": No such file or directory
[2019-11-05 18:54:49,472] EIPipelineManger ERROR    Process: Editing_Index_Pipline_SRR5962209; Going To Error Step: Step_1_backup
[2019-11-05 18:54:49,509] EIPipelineManger ERROR    Process: Editing_Index_Pipline_SRR5962201; Going To Error Step: Step_1_backup
[bam_sort_core] merging from 0 files and 10 in-memory blocks...
[E::hts_open_format] Failed to open file ./BAMs/SRR5962217/SRR5962217/SRR5962217_region_ucscHg38Alu.bed.gz_alignments.bam
samtools sort: failed to create "./BAMs/SRR5962217/SRR5962217/SRR5962217_region_ucscHg38Alu.bed.gz_alignments.bam": No such file or directory
[2019-11-05 18:54:51,431] EIPipelineManger ERROR    Process: Editing_Index_Pipline_SRR5962217; Going To Error Step: Step_1_backup
[bam_sort_core] merging from 0 files and 10 in-memory blocks...
[E::hts_open_format] Failed to open file ./BAMs/SRR5962219/SRR5962219/SRR5962219_region_ucscHg38Alu.bed.gz_alignments.bam
samtools sort: failed to create "./BAMs/SRR5962219/SRR5962219/SRR5962219_region_ucscHg38Alu.bed.gz_alignments.bam": No such file or directory
[2019-11-05 18:54:54,933] EIPipelineManger ERROR    Process: Editing_Index_Pipline_SRR5962219; Going To Error Step: Step_1_backup
[2019-11-05 18:54:54,936] general_functions WARNING  GGPSResources.general_functions.remove_files Failed To Remove 05-11-2019-18.cnf
[2019-11-05 18:54:54,936] general_functions WARNING  GGPSResources.general_functions.remove_files Failed To Remove 05-11-2019-18.cnf
[2019-11-05 18:57:42,613] A2IEditingIndex ERROR    Failed Loading Coverage Data of SRR5962201! (Won't Delete cmpileup)
Traceback (most recent call last):
  File "Tools/EditingIndex/A2IEditingIndex.py", line 856, in get_strands_and_counts
  File "Tools/EditingIndex/DataConverters/CountPileupCoverter.py", line 69, in parse_count_pileup
IOError: [Errno 2] No such file or directory: './BAMs/SRR5962201/SRR5962201/SRR5962201_ucscHg38Alu.bed.gz_mpileup.cmpileup'
[2019-11-05 18:57:43,005] A2IEditingIndex ERROR    Failed Loading Coverage Data of SRR5962209! (Won't Delete cmpileup)
Traceback (most recent call last):
  File "Tools/EditingIndex/A2IEditingIndex.py", line 856, in get_strands_and_counts
  File "Tools/EditingIndex/DataConverters/CountPileupCoverter.py", line 69, in parse_count_pileup
IOError: [Errno 2] No such file or directory: './BAMs/SRR5962209/SRR5962209/SRR5962209_ucscHg38Alu.bed.gz_mpileup.cmpileup'
[2019-11-05 18:57:43,395] A2IEditingIndex ERROR    Failed Loading Coverage Data of SRR5962217! (Won't Delete cmpileup)
Traceback (most recent call last):
  File "Tools/EditingIndex/A2IEditingIndex.py", line 856, in get_strands_and_counts
  File "Tools/EditingIndex/DataConverters/CountPileupCoverter.py", line 69, in parse_count_pileup
IOError: [Errno 2] No such file or directory: './BAMs/SRR5962217/SRR5962217/SRR5962217_ucscHg38Alu.bed.gz_mpileup.cmpileup'
[2019-11-05 18:57:43,784] A2IEditingIndex ERROR    Failed Loading Coverage Data of SRR5962219! (Won't Delete cmpileup)
Traceback (most recent call last):
  File "Tools/EditingIndex/A2IEditingIndex.py", line 856, in get_strands_and_counts
  File "Tools/EditingIndex/DataConverters/CountPileupCoverter.py", line 69, in parse_count_pileup
IOError: [Errno 2] No such file or directory: './BAMs/SRR5962219/SRR5962219/SRR5962219_ucscHg38Alu.bed.gz_mpileup.cmpileup'
[2019-11-05 18:57:44,095] general_functions WARNING  GGPSResources.general_functions.remove_files Failed To Remove ./BAMs/SRR5962201/SRR5962201/SRR5962201_ucscHg38Alu.bed.gz_mpileup.cmpileup
[2019-11-05 18:57:44,095] general_functions WARNING  GGPSResources.general_functions.remove_files Failed To Remove ./BAMs/SRR5962217/SRR5962217/SRR5962217_ucscHg38Alu.bed.gz_mpileup.cmpileup
[2019-11-05 18:57:44,096] general_functions WARNING  GGPSResources.general_functions.remove_files Failed To Remove ./BAMs/SRR5962219/SRR5962219/SRR5962219_ucscHg38Alu.bed.gz_mpileup.cmpileup
[2019-11-05 18:57:44,096] general_functions WARNING  GGPSResources.general_functions.remove_files Failed To Remove ./ucscHg38Genome.fa.GenomeIndex.jsd
[2019-11-05 18:57:44,096] general_functions WARNING  GGPSResources.general_functions.remove_files Failed To Remove ./BAMs/SRR5962209/SRR5962209/SRR5962209_ucscHg38Alu.bed.gz_mpileup.cmpileup
mjsteinbaugh commented 4 years ago

EditingIndex.2019-11-05T18:54:44.595907.log

mjsteinbaugh commented 4 years ago

To follow up, I got this working using the recommended steps in the Docker.README.md file. There's a couple of improvements I made to the Dockerfile, and I'll work on a pull request.

mjsteinbaugh commented 4 years ago

PS @shalomhillelroth, I'm working on some Dockerfile recipe improvements here if you want to take a look: https://github.com/acidgenomics/docker/tree/master/rnaeditingindexer/latest