caleblareau / mgatk

mgatk: mitochondrial genome analysis toolkit
http://caleblareau.github.io/mgatk
MIT License
101 stars 27 forks source link

Missing files due to improper folder specification #21

Closed mathosi closed 4 years ago

mathosi commented 4 years ago

Hi,

I am trying to run the example provided for mgatk bcall under mgatk/tests, which results in an error due to missing files. Within bc1d/final/, there is only chrM_refAllele.txt generated. I get the same error when trying to run the genotyping for my own 10x-scATAC data. Do you have an idea what might be the problem here?

cd mgatk/tests mgatk bcall -i barcode/test_barcode.bam -n bc1 -o bc1d -bt CB -b barcode/test_barcodes.txt -z Fri Jun 12 10:44:30 CEST 2020: mgatk v0.5.3 Fri Jun 12 10:44:30 CEST 2020: Found bam file: barcode/test_barcode.bam for genotyping. Fri Jun 12 10:44:30 CEST 2020: Found file of barcodes to be parsed: barcode/test_barcodes.txt Fri Jun 12 10:44:30 CEST 2020: User specified mitochondrial genome matches .bam file Fri Jun 12 10:44:33 CEST 2020: Finished determining/splitting barcodes for genotyping. Fri Jun 12 10:44:33 CEST 2020: Genotyping samples with 1 threads

Error in checkGrep(grep(".A.txt", files)) : Improper folder specification; file missing / extra file present. See documentation Calls: importMito -> checkGrep Execution halted

Thanks and best wishes, Malte

caleblareau commented 4 years ago

Could you please send the intermediate output files contained in the folder?

caleblareau commented 4 years ago

@mathosi thanks for the email The eror has to do with an incompatibility issue in snakemake that was recently addressed in version 0.5.5 that has now been pushed to PyPi. Please update and rerun, and I would anticipate that the issue should be addressed.

mathosi commented 4 years ago

Thank you, updating to version 0.5.5 solved the problem!

bobermayer commented 4 years ago

Hi, I'm getting the same error under version 0.5.6 (snakemake version 5.20.1):

$ mgatk bcall -i test/test_barcode.bam -n bc1 -o bc1d -bt CB -b test/test_barcodes.txt -z
Wed Aug 05 14:50:49 CEST 2020: mgatk v0.5.6
Wed Aug 05 14:50:50 CEST 2020: Found bam file: test/test_barcode.bam for genotyping.
Wed Aug 05 14:50:50 CEST 2020: Found file of barcodes to be parsed: test/test_barcodes.txt
Wed Aug 05 14:50:50 CEST 2020: User specified mitochondrial genome matches .bam file
Wed Aug 05 14:50:53 CEST 2020: Finished determining/splitting barcodes for genotyping.
Wed Aug 05 14:50:53 CEST 2020: Genotyping samples with 2 threads
Error in checkGrep(grep(".A.txt", files)) : 
  Improper folder specification; file missing / extra file present. See documentation
Calls: importMito -> checkGrep
Execution halted

any ideas? thanks! Benedikt

caleblareau commented 4 years ago

Can you post the contents of the folder ? (ls -lR)

On Aug 5, 2020, at 5:52 AM, Benedikt Obermayer notifications@github.com wrote:

 Hi, I'm getting the same error under version 0.5.6 (snakemake version 5.20.1):

$ mgatk bcall -i test/test_barcode.bam -n bc1 -o bc1d -bt CB -b test/test_barcodes.txt -z Wed Aug 05 14:50:49 CEST 2020: mgatk v0.5.6 Wed Aug 05 14:50:50 CEST 2020: Found bam file: test/test_barcode.bam for genotyping. Wed Aug 05 14:50:50 CEST 2020: Found file of barcodes to be parsed: test/test_barcodes.txt Wed Aug 05 14:50:50 CEST 2020: User specified mitochondrial genome matches .bam file Wed Aug 05 14:50:53 CEST 2020: Finished determining/splitting barcodes for genotyping. Wed Aug 05 14:50:53 CEST 2020: Genotyping samples with 2 threads Error in checkGrep(grep(".A.txt", files)) : Improper folder specification; file missing / extra file present. See documentation Calls: importMito -> checkGrep Execution halted any ideas? thanks! Benedikt

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

bobermayer commented 4 years ago

hi, sure:

$ ls -lR test
test:
total 7169
-rw-r--r-- 1 obermayb_c hpc-ag-cubi 7333460 Aug  5 14:49 test_barcode.bam
-rw-r--r-- 1 obermayb_c hpc-ag-cubi     864 Aug  5 14:49 test_barcode.bam.bai
-rw-r--r-- 1 obermayb_c hpc-ag-cubi      57 Aug  5 14:49 test_barcodes.txt

and

$ ls -lR bc1d
bc1d:
total 3
drwxr-sr-x 2 obermayb_c hpc-ag-cubi 4096 Aug  6 08:42 fasta
drwxr-sr-x 2 obermayb_c hpc-ag-cubi 4096 Aug  6 08:42 final
drwxr-sr-x 4 obermayb_c hpc-ag-cubi 4096 Aug  6 08:42 logs
drwxr-sr-x 3 obermayb_c hpc-ag-cubi 4096 Aug  6 08:42 qc
drwxr-sr-x 5 obermayb_c hpc-ag-cubi 4096 Aug  6 08:42 temp

bc1d/fasta:
total 257
-rw-r--r-- 1 obermayb_c hpc-ag-cubi 16907 Aug  6 08:42 chrM.fasta
-rw-r--r-- 1 obermayb_c hpc-ag-cubi    19 Aug  6 08:42 chrM.fasta.fai

bc1d/final:
total 256
-rw-r--r-- 1 obermayb_c hpc-ag-cubi 121446 Aug  6 08:42 chrM_refAllele.txt

bc1d/logs:
total 259
-rw-r--r-- 1 obermayb_c hpc-ag-cubi  438 Aug  6 08:42 base.mgatk.log
-rw-r--r-- 1 obermayb_c hpc-ag-cubi  452 Aug  6 08:42 bc1.parameters.txt
-rw-r--r-- 1 obermayb_c hpc-ag-cubi 2002 Aug  6 08:42 bc1.snakemake_gather.log
-rw-r--r-- 1 obermayb_c hpc-ag-cubi 5171 Aug  6 08:42 bc1.snakemake_scatter.log
drwxr-sr-x 2 obermayb_c hpc-ag-cubi 4096 Aug  6 08:42 filterlogs
drwxr-sr-x 2 obermayb_c hpc-ag-cubi 4096 Aug  6 08:42 rmdupslogs

bc1d/logs/filterlogs:
total 1
-rw-r--r-- 1 obermayb_c hpc-ag-cubi 21 Aug  6 08:42 CTAACTTAGAGCCACA-1.filter.log
-rw-r--r-- 1 obermayb_c hpc-ag-cubi 21 Aug  6 08:42 GCCTAGGCAGTTCGGC-1.filter.log

bc1d/logs/rmdupslogs:
total 0

bc1d/qc:
total 1
drwxr-sr-x 2 obermayb_c hpc-ag-cubi 4096 Aug  6 08:42 quality

bc1d/qc/quality:
total 0

bc1d/temp:
total 2
drwxr-sr-x 2 obermayb_c hpc-ag-cubi 4096 Aug  6 08:42 barcoded_bams
drwxr-sr-x 2 obermayb_c hpc-ag-cubi 4096 Aug  6 08:42 quality
drwxr-sr-x 2 obermayb_c hpc-ag-cubi 4096 Aug  6 08:42 temp_bam

bc1d/temp/barcoded_bams:
total 7682
-rw-r--r-- 1 obermayb_c hpc-ag-cubi 2765493 Aug  6 08:42 CACCACTAGGAGGCGA-1.bam
-rw-r--r-- 1 obermayb_c hpc-ag-cubi     808 Aug  6 08:42 CACCACTAGGAGGCGA-1.bam.bai
-rw-r--r-- 1 obermayb_c hpc-ag-cubi 2429008 Aug  6 08:42 CTAACTTAGAGCCACA-1.bam
-rw-r--r-- 1 obermayb_c hpc-ag-cubi     808 Aug  6 08:42 CTAACTTAGAGCCACA-1.bam.bai
-rw-r--r-- 1 obermayb_c hpc-ag-cubi 2116311 Aug  6 08:42 GCCTAGGCAGTTCGGC-1.bam
-rw-r--r-- 1 obermayb_c hpc-ag-cubi     792 Aug  6 08:42 GCCTAGGCAGTTCGGC-1.bam.bai

bc1d/temp/quality:
total 0

bc1d/temp/temp_bam:
total 9729
-rw-r--r-- 1 obermayb_c hpc-ag-cubi 2429008 Aug  6 08:42 CTAACTTAGAGCCACA-1.temp0.bam
-rw-r--r-- 1 obermayb_c hpc-ag-cubi 2429112 Aug  6 08:42 CTAACTTAGAGCCACA-1.temp1.bam
-rw-r--r-- 1 obermayb_c hpc-ag-cubi     808 Aug  6 08:42 CTAACTTAGAGCCACA-1.temp1.bam.bai
-rw-r--r-- 1 obermayb_c hpc-ag-cubi 2116311 Aug  6 08:42 GCCTAGGCAGTTCGGC-1.temp0.bam
-rw-r--r-- 1 obermayb_c hpc-ag-cubi 2116416 Aug  6 08:42 GCCTAGGCAGTTCGGC-1.temp1.bam
-rw-r--r-- 1 obermayb_c hpc-ag-cubi     792 Aug  6 08:42 GCCTAGGCAGTTCGGC-1.temp1.bam.bai
caleblareau commented 4 years ago

From this directory, can you

cat bc1d/logs/*.log

There should be some level of error message there

On Aug 5, 2020, at 11:44 PM, Benedikt Obermayer notifications@github.com wrote:

hi, sure:

$ ls -lR test test: total 7169 -rw-r--r-- 1 obermayb_c hpc-ag-cubi 7333460 Aug 5 14:49 test_barcode.bam -rw-r--r-- 1 obermayb_c hpc-ag-cubi 864 Aug 5 14:49 test_barcode.bam.bai -rw-r--r-- 1 obermayb_c hpc-ag-cubi 57 Aug 5 14:49 test_barcodes.txt and

$ ls -lR bc1d bc1d: total 3 drwxr-sr-x 2 obermayb_c hpc-ag-cubi 4096 Aug 6 08:42 fasta drwxr-sr-x 2 obermayb_c hpc-ag-cubi 4096 Aug 6 08:42 final drwxr-sr-x 4 obermayb_c hpc-ag-cubi 4096 Aug 6 08:42 logs drwxr-sr-x 3 obermayb_c hpc-ag-cubi 4096 Aug 6 08:42 qc drwxr-sr-x 5 obermayb_c hpc-ag-cubi 4096 Aug 6 08:42 temp

bc1d/fasta: total 257 -rw-r--r-- 1 obermayb_c hpc-ag-cubi 16907 Aug 6 08:42 chrM.fasta -rw-r--r-- 1 obermayb_c hpc-ag-cubi 19 Aug 6 08:42 chrM.fasta.fai

bc1d/final: total 256 -rw-r--r-- 1 obermayb_c hpc-ag-cubi 121446 Aug 6 08:42 chrM_refAllele.txt

bc1d/logs: total 259 -rw-r--r-- 1 obermayb_c hpc-ag-cubi 438 Aug 6 08:42 base.mgatk.log -rw-r--r-- 1 obermayb_c hpc-ag-cubi 452 Aug 6 08:42 bc1.parameters.txt -rw-r--r-- 1 obermayb_c hpc-ag-cubi 2002 Aug 6 08:42 bc1.snakemake_gather.log -rw-r--r-- 1 obermayb_c hpc-ag-cubi 5171 Aug 6 08:42 bc1.snakemake_scatter.log drwxr-sr-x 2 obermayb_c hpc-ag-cubi 4096 Aug 6 08:42 filterlogs drwxr-sr-x 2 obermayb_c hpc-ag-cubi 4096 Aug 6 08:42 rmdupslogs

bc1d/logs/filterlogs: total 1 -rw-r--r-- 1 obermayb_c hpc-ag-cubi 21 Aug 6 08:42 CTAACTTAGAGCCACA-1.filter.log -rw-r--r-- 1 obermayb_c hpc-ag-cubi 21 Aug 6 08:42 GCCTAGGCAGTTCGGC-1.filter.log

bc1d/logs/rmdupslogs: total 0

bc1d/qc: total 1 drwxr-sr-x 2 obermayb_c hpc-ag-cubi 4096 Aug 6 08:42 quality

bc1d/qc/quality: total 0

bc1d/temp: total 2 drwxr-sr-x 2 obermayb_c hpc-ag-cubi 4096 Aug 6 08:42 barcoded_bams drwxr-sr-x 2 obermayb_c hpc-ag-cubi 4096 Aug 6 08:42 quality drwxr-sr-x 2 obermayb_c hpc-ag-cubi 4096 Aug 6 08:42 temp_bam

bc1d/temp/barcoded_bams: total 7682 -rw-r--r-- 1 obermayb_c hpc-ag-cubi 2765493 Aug 6 08:42 CACCACTAGGAGGCGA-1.bam -rw-r--r-- 1 obermayb_c hpc-ag-cubi 808 Aug 6 08:42 CACCACTAGGAGGCGA-1.bam.bai -rw-r--r-- 1 obermayb_c hpc-ag-cubi 2429008 Aug 6 08:42 CTAACTTAGAGCCACA-1.bam -rw-r--r-- 1 obermayb_c hpc-ag-cubi 808 Aug 6 08:42 CTAACTTAGAGCCACA-1.bam.bai -rw-r--r-- 1 obermayb_c hpc-ag-cubi 2116311 Aug 6 08:42 GCCTAGGCAGTTCGGC-1.bam -rw-r--r-- 1 obermayb_c hpc-ag-cubi 792 Aug 6 08:42 GCCTAGGCAGTTCGGC-1.bam.bai

bc1d/temp/quality: total 0

bc1d/temp/temp_bam: total 9729 -rw-r--r-- 1 obermayb_c hpc-ag-cubi 2429008 Aug 6 08:42 CTAACTTAGAGCCACA-1.temp0.bam -rw-r--r-- 1 obermayb_c hpc-ag-cubi 2429112 Aug 6 08:42 CTAACTTAGAGCCACA-1.temp1.bam -rw-r--r-- 1 obermayb_c hpc-ag-cubi 808 Aug 6 08:42 CTAACTTAGAGCCACA-1.temp1.bam.bai -rw-r--r-- 1 obermayb_c hpc-ag-cubi 2116311 Aug 6 08:42 GCCTAGGCAGTTCGGC-1.temp0.bam -rw-r--r-- 1 obermayb_c hpc-ag-cubi 2116416 Aug 6 08:42 GCCTAGGCAGTTCGGC-1.temp1.bam -rw-r--r-- 1 obermayb_c hpc-ag-cubi 792 Aug 6 08:42 GCCTAGGCAGTTCGGC-1.temp1.bam.bai — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/caleblareau/mgatk/issues/21#issuecomment-669737603, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD32FYO47IMQX53SXGUHAEDR7JGMTANCNFSM4N4EM3NQ.

bobermayer commented 4 years ago

yes of course.

$ cat bc1d/logs/bc1.snakemake_scatter.log 
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 2
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   all
    1   make_sample_list
    3   process_one_sample
    5

[Thu Aug  6 08:42:34 2020]
rule process_one_sample:
    input: bc1d/.internal/samples/GCCTAGGCAGTTCGGC-1.bam.txt
    output: bc1d/temp/ready_bam/GCCTAGGCAGTTCGGC-1.qc.bam, bc1d/temp/ready_bam/GCCTAGGCAGTTCGGC-1.qc.bam.bai, bc1d/qc/depth/GCCTAGGCAGTTCGGC-1.depth.txt, bc1d/temp/sparse_matrices/GCCTAGGCAGTTCGGC-1.A.txt, bc1d/temp/sparse_matrices/GCCTAGGCAGTTCGGC-1.C.txt, bc1d/temp/sparse_matrices/GCCTAGGCAGTTCGGC-1.G.txt, bc1d/temp/sparse_matrices/GCCTAGGCAGTTCGGC-1.T.txt, bc1d/temp/sparse_matrices/GCCTAGGCAGTTCGGC-1.coverage.txt
    jobid: 2
    wildcards: sample=GCCTAGGCAGTTCGGC-1

[Thu Aug  6 08:42:34 2020]
rule process_one_sample:
    input: bc1d/.internal/samples/CTAACTTAGAGCCACA-1.bam.txt
    output: bc1d/temp/ready_bam/CTAACTTAGAGCCACA-1.qc.bam, bc1d/temp/ready_bam/CTAACTTAGAGCCACA-1.qc.bam.bai, bc1d/qc/depth/CTAACTTAGAGCCACA-1.depth.txt, bc1d/temp/sparse_matrices/CTAACTTAGAGCCACA-1.A.txt, bc1d/temp/sparse_matrices/CTAACTTAGAGCCACA-1.C.txt, bc1d/temp/sparse_matrices/CTAACTTAGAGCCACA-1.G.txt, bc1d/temp/sparse_matrices/CTAACTTAGAGCCACA-1.T.txt, bc1d/temp/sparse_matrices/CTAACTTAGAGCCACA-1.coverage.txt
    jobid: 3
    wildcards: sample=CTAACTTAGAGCCACA-1

Job counts:
    count   jobs
    1   process_one_sample
    1
Job counts:
    count   jobs
    1   process_one_sample
    1
Traceback (most recent call last):
  File "/fast/work/users/obermayb_c/scratch/miniconda/envs/mgatk/lib/python3.6/site-packages/mgatk/bin/python/oneSample.py", line 80, in <module>
    pysam.index(outputbam)
  File "/fast/users/obermayb_c/scratch/miniconda/envs/mgatk/lib/python3.6/site-packages/pysam/utils.py", line 61, in __call__
    save_stdout=kwargs.get("save_stdout", None))
  File "pysam/libcutils.pyx", line 293, in pysam.libcutils._pysam_dispatch
OSError: No such file or directory: 'bc1d/temp/ready_bam/GCCTAGGCAGTTCGGC-1.qc.bam'
Traceback (most recent call last):
  File "/fast/work/users/obermayb_c/scratch/miniconda/envs/mgatk/lib/python3.6/site-packages/mgatk/bin/python/oneSample.py", line 80, in <module>
    pysam.index(outputbam)
  File "/fast/users/obermayb_c/scratch/miniconda/envs/mgatk/lib/python3.6/site-packages/pysam/utils.py", line 61, in __call__
    save_stdout=kwargs.get("save_stdout", None))
  File "pysam/libcutils.pyx", line 293, in pysam.libcutils._pysam_dispatch
OSError: No such file or directory: 'bc1d/temp/ready_bam/CTAACTTAGAGCCACA-1.qc.bam'
MissingOutputException in line 21 of /fast/work/users/obermayb_c/scratch/miniconda/envs/mgatk/lib/python3.6/site-packages/mgatk/bin/snake/Snakefile.Scatter:
Job completed successfully, but some output files are missing. Missing files after 5 seconds:
bc1d/temp/ready_bam/CTAACTTAGAGCCACA-1.qc.bam
bc1d/temp/ready_bam/CTAACTTAGAGCCACA-1.qc.bam.bai
bc1d/qc/depth/CTAACTTAGAGCCACA-1.depth.txt
bc1d/temp/sparse_matrices/CTAACTTAGAGCCACA-1.A.txt
bc1d/temp/sparse_matrices/CTAACTTAGAGCCACA-1.C.txt
bc1d/temp/sparse_matrices/CTAACTTAGAGCCACA-1.G.txt
bc1d/temp/sparse_matrices/CTAACTTAGAGCCACA-1.T.txt
bc1d/temp/sparse_matrices/CTAACTTAGAGCCACA-1.coverage.txt
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
  File "/fast/users/obermayb_c/scratch/miniconda/envs/mgatk/lib/python3.6/site-packages/snakemake/executors/__init__.py", line 544, in handle_job_success
  File "/fast/users/obermayb_c/scratch/miniconda/envs/mgatk/lib/python3.6/site-packages/snakemake/executors/__init__.py", line 231, in handle_job_success
MissingOutputException in line 21 of /fast/work/users/obermayb_c/scratch/miniconda/envs/mgatk/lib/python3.6/site-packages/mgatk/bin/snake/Snakefile.Scatter:
Job completed successfully, but some output files are missing. Missing files after 5 seconds:
bc1d/temp/ready_bam/GCCTAGGCAGTTCGGC-1.qc.bam
bc1d/temp/ready_bam/GCCTAGGCAGTTCGGC-1.qc.bam.bai
bc1d/qc/depth/GCCTAGGCAGTTCGGC-1.depth.txt
bc1d/temp/sparse_matrices/GCCTAGGCAGTTCGGC-1.A.txt
bc1d/temp/sparse_matrices/GCCTAGGCAGTTCGGC-1.C.txt
bc1d/temp/sparse_matrices/GCCTAGGCAGTTCGGC-1.G.txt
bc1d/temp/sparse_matrices/GCCTAGGCAGTTCGGC-1.T.txt
bc1d/temp/sparse_matrices/GCCTAGGCAGTTCGGC-1.coverage.txt
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
  File "/fast/users/obermayb_c/scratch/miniconda/envs/mgatk/lib/python3.6/site-packages/snakemake/executors/__init__.py", line 544, in handle_job_success
  File "/fast/users/obermayb_c/scratch/miniconda/envs/mgatk/lib/python3.6/site-packages/snakemake/executors/__init__.py", line 231, in handle_job_success
Exiting because a job execution failed. Look above for error message
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /fast/work/groups/cubi/projects/2020-06-29_Starossom_lineage_tracing/2020_07_29_cellranger/.snakemake/log/2020-08-06T084234.060411.snakemake.log

there's another error in bc1.snakemake_gather.log probably caused by the missing files

does that help?

thanks a lot!

caleblareau commented 4 years ago

my best guess is that you are missing java which is used for doing the PCR deduplication... what does which java yield in your machine? And can you run the command with --keep-duplicates, which will circumvent the java dependency?

bobermayer commented 4 years ago

hi, thanks for the tip. I have java, but not the right version

$ which java
~/scratch/miniconda/envs/mgatk/bin/java
$ java -version
openjdk version "1.7.0_91"

I managed to track the original error down to a Picard error:

Exception in thread "main" java.lang.UnsupportedClassVersionError: picard/cmdline/PicardCommandLine : Unsupported major.minor version 52.0

after installing java v1.8.0 I get to the same point as when I run with --keep-duplicates

$ mgatk bcall -i test/test_barcode.bam -n bc1 -o bc1d -bt CB -b test/test_barcodes.txt -z -c 1 
Mon Aug 10 14:58:00 CEST 2020: mgatk v0.5.6
Mon Aug 10 14:58:00 CEST 2020: Found bam file: test/test_barcode.bam for genotyping.
Mon Aug 10 14:58:00 CEST 2020: Found file of barcodes to be parsed: test/test_barcodes.txt
Mon Aug 10 14:58:01 CEST 2020: User specified mitochondrial genome matches .bam file
Mon Aug 10 14:58:04 CEST 2020: Finished determining/splitting barcodes for genotyping.
Mon Aug 10 14:58:04 CEST 2020: Genotyping samples with 1 threads
Error in `[<-.data.table`(x, j = name, value = value) : 
  RHS of assignment to new column 'sample' is zero length but not empty list(). For new columns the RHS must either be empty list() to create an empty list column, or, have length > 0; e.g. NA_integer_, 0L, etc.
Calls: importMito ... lapply -> FUN -> $<- -> $<-.data.table -> [<-.data.table
Execution halted

the problem now seems to be that the variantFiles (e.g., bc1d/final/bc1.A.txt.gz) don't have a header, and thus importSMs in toRDS.R fails when trying to do

dt$sample <- factor(dt$sample, levels = samplesOrder)

should these files have a header and it wasn't created by makeSM in Snakefile.Gather or should colnames be set in importSMs?

thanks again! Benedikt

caleblareau commented 4 years ago

The headers aren’t intended to be there and shouldn’t throw the error. Can you send an ls -lRh of the output file? My guess is that the plain text files are being created but empty

On Aug 10, 2020, at 6:21 AM, Benedikt Obermayer notifications@github.com wrote:



hi, thanks for the tip. I have java, but not the right version

$ which java ~/scratch/miniconda/envs/mgatk/bin/java $ java -version openjdk version "1.7.0_91"

I managed to track the original error down to a Picard error:

Exception in thread "main" java.lang.UnsupportedClassVersionError: picard/cmdline/PicardCommandLine : Unsupported major.minor version 52.0

after installing java v1.8.0 I get to the same point as when I run with --keep-duplicates

$ mgatk bcall -i test/test_barcode.bam -n bc1 -o bc1d -bt CB -b test/test_barcodes.txt -z -c 1 Mon Aug 10 14:58:00 CEST 2020: mgatk v0.5.6 Mon Aug 10 14:58:00 CEST 2020: Found bam file: test/test_barcode.bam for genotyping. Mon Aug 10 14:58:00 CEST 2020: Found file of barcodes to be parsed: test/test_barcodes.txt Mon Aug 10 14:58:01 CEST 2020: User specified mitochondrial genome matches .bam file Mon Aug 10 14:58:04 CEST 2020: Finished determining/splitting barcodes for genotyping. Mon Aug 10 14:58:04 CEST 2020: Genotyping samples with 1 threads Error in [<-.data.table(x, j = name, value = value) : RHS of assignment to new column 'sample' is zero length but not empty list(). For new columns the RHS must either be empty list() to create an empty list column, or, have length > 0; e.g. NAinteger, 0L, etc. Calls: importMito ... lapply -> FUN -> $<- -> $<-.data.table -> [<-.data.table Execution halted

the problem now seems to be that the variantFiles (e.g., bc1d/final/bc1.A.txt.gz) don't have a header, and thus importSMs in toRDS.R fails when trying to do

dt$sample <- factor(dt$sample, levels = samplesOrder)

should these files have a header and it wasn't created by makeSM in Snakefile.Gather or should colnames be set in importSMs?

thanks again! Benedikt

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/caleblareau/mgatk/issues/21#issuecomment-671350407, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD32FYNOBMZ44MN6QBRGO3DR77X6FANCNFSM4N4EM3NQ.

bobermayer commented 4 years ago

I think these files are there and not empty:

$ ls -hlR bc1d/final
bc1d/final:
total 1.6M
-rw-r--r-- 1 obermayb_c hpc-ag-cubi  97K Aug 10 14:58 bc1.A.txt.gz
-rw-r--r-- 1 obermayb_c hpc-ag-cubi 194K Aug 10 14:58 bc1.coverage.txt.gz
-rw-r--r-- 1 obermayb_c hpc-ag-cubi  96K Aug 10 14:58 bc1.C.txt.gz
-rw-r--r-- 1 obermayb_c hpc-ag-cubi   76 Aug 10 14:58 bc1.depthTable.txt
-rw-r--r-- 1 obermayb_c hpc-ag-cubi  50K Aug 10 14:58 bc1.G.txt.gz
-rw-r--r-- 1 obermayb_c hpc-ag-cubi  77K Aug 10 14:58 bc1.T.txt.gz
-rw-r--r-- 1 obermayb_c hpc-ag-cubi 119K Aug 10 14:58 chrM_refAllele.txt
$ zcat bc1d/final/bc1.A.txt.gz | head -n5
2,CTAACTTAGAGCCACA-1,41,52
5,CTAACTTAGAGCCACA-1,47,60
7,CTAACTTAGAGCCACA-1,47,62
13,CTAACTTAGAGCCACA-1,55,72
16,CTAACTTAGAGCCACA-1,59,74

but I don't see how dt$sample <- factor(dt$sample, levels = samplesOrder) can work without defining column names somewhere (in toRDS.R)

sorry for causing so much trouble and thanks a lot for your help!

caleblareau commented 4 years ago

@bobermayer no! this is actually a bug on my part... thanks for digging through it and finding it!

You're totally right-- the way that I have that line definite isn't right based on the missing header. It's interesting that on my machine, I get a different outcome:

> head(dt)
   V1                 V2 V3 V4
1:  2 ATTGTCTTCCCACTAC-1 10  7
2:  5 ATTGTCTTCCCACTAC-1 10  7
3:  7 ATTGTCTTCCCACTAC-1 10  8
4: 13 ATTGTCTTCCCACTAC-1 11 11
5: 16 ATTGTCTTCCCACTAC-1 11 11
6: 21 ATTGTCTTCCCACTAC-1 14 13
>  dt$sample <- factor(dt$sample, levels = samplesOrder)
> head(dt)
   V1                 V2 V3 V4 sample
1:  2 ATTGTCTTCCCACTAC-1 10  7   <NA>
2:  5 ATTGTCTTCCCACTAC-1 10  7   <NA>
3:  7 ATTGTCTTCCCACTAC-1 10  8   <NA>
4: 13 ATTGTCTTCCCACTAC-1 11 11   <NA>
5: 16 ATTGTCTTCCCACTAC-1 11 11   <NA>
6: 21 ATTGTCTTCCCACTAC-1 14 13   <NA>

It's interesting that your machine throws this as an error but mine didn't. Regardless, this needs to be fixed. I'll bump the version now. Thank you!

caleblareau commented 4 years ago

@bobermayer try running v0.5.8 (available via PyPi) now?

bobermayer commented 4 years ago

yes, great, now it works (took a while because I had a very old version of R in my conda environment and I had to fix a few dependencies there). thanks a lot!

cnk113 commented 4 years ago

Hello,

I'm getting the same error on the latest version.

Error in checkGrep(grep(".A.txt", files)) : 
  Improper folder specification; file missing / extra file present. See documentation
Calls: importMito -> checkGrep

I've also tried running with marking duplicates and still end up at the same place. The output directory looks like this:

final:
chrM_refAllele.txt

logs:
base.mgatk.log  filterlogs  mgatk.parameters.txt  mgatk.snakemake_tenx.log  rmdupslogs

qc:
quality
caleblareau commented 4 years ago

Can you cat the base.mgatk.log and the mgatk.snakemake_tenx.log file?

On Sep 16, 2020, at 4:01 PM, Chang Kim notifications@github.com<mailto:notifications@github.com> wrote:

Hello,

I'm getting the same error on the latest version.

Improper folder specification; file missing / extra file present. See documentation Calls: importMito -> checkGrep```

I've also tried running with marking duplicates and still end up at the same place. The output directory looks like this:


chrM_refAllele.txt

logs:
base.mgatk.log  filterlogs  mgatk.parameters.txt  mgatk.snakemake_tenx.log  rmdupslogs

qc:
quality```

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<https://github.com/caleblareau/mgatk/issues/21#issuecomment-693710168>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AD32FYMRXENLCBQPARCL4XLSGE7U7ANCNFSM4N4EM3NQ>.
cnk113 commented 4 years ago

Actually from #22 using --snake-stdout solved the issue.