gagneurlab / drop

Pipeline to find aberrant events in RNA-Seq data, useful for diagnosis of rare disorders
MIT License
129 stars 43 forks source link

oo few IDs in DROP_GROUP mae, please ensure that it has at least 1 IDs, groups: [] #466

Closed SpaceCropTechnologies closed 3 months ago

SpaceCropTechnologies commented 1 year ago

Good day,

Hello I am trying to use the Monoallelic expression, however, I am receiving the this error:

_too few IDs in DROPGROUP mae, please ensure that it has at least 1 IDs, groups: []

I am successful running the Splicing and AberrantExpression, I am only having problems with MAE.

This is my mae config: mae: run: true groups:

image The image shows my sample annotation.

P.S. running snakemake sampleAnnotation also gives error.

vyepez88 commented 1 year ago

it could be that there are no spacings before - mae in the groups parameter in the mae dictionary. It should be:

groups:
  - mae
lbundalian commented 1 year ago

Hello. It is still me, I just used the wrong account for posting but yes, that is my current problem. Given the spacings as mentioned still show the same problem.

vyepez88 commented 1 year ago

can you try:

groups: null
lbundalian commented 1 year ago

Hello. God day! It produces the same error with groups: null

vyepez88 commented 1 year ago

Can you double-check that all BAM files exist?

lbundalian commented 1 year ago

Ok Thanks I will

lbundalian commented 1 year ago

It is working now, there is one I misspelled. However, I am getting this now:

[Sun May 14 15:12:35 2023] Error in rule mae_createSNVs: jobid: 52 input: /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_NCBI_UCSC.txt, /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_UCSC_NCBI.txt, /work/users/pz192nijo/Projects/Schubert.DROP3/vcf/31800SL_S26-gatk-haplotype.vcf.gz, /work/users/pz192nijo/Projects/Schubert.DROP3/bam/SS_3032142_RAligned.sortedByCoord.out.bam, /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/MonoallelicExpression/pipeline/MAE/filterSNVs.sh, /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/params/snvs/3032142_snvParams.csv output: /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/3032142--3032142.vcf.gz, /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/3032142--3032142.vcf.gz.tbi shell:

    /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/MonoallelicExpression/pipeline/MAE/filterSNVs.sh /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_NCBI_UCSC.txt /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_UCSC_NCBI.txt /work/users/pz192nijo/Projects/Schubert.DROP3/vcf/31800SL_S26-gatk-haplotype.vcf.gz         3032142 /work/users/pz192nijo/Projects/Schubert.DROP3/bam/SS_3032142_RAligned.sortedByCoord.out.bam /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/3032142--3032142.vcf.gz         bcftools samtools

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Sun May 14 15:12:35 2023] rule mae_createSNVs: input: /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_NCBI_UCSC.txt, /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_UCSC_NCBI.txt, /work/users/pz192nijo/Projects/Archive/DROP3.DEMO/Data/qc_vcf_1000G.vcf.gz, /work/users/pz192nijo/Projects/Schubert.DROP3/bam/SS_3029387_RAligned.sortedByCoord.out.bam, /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/MonoallelicExpression/pipeline/MAE/filterSNVs.sh, /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/params/snvs/3029387_snvParams.csv output: /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/QC--3029387.vcf.gz, /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/QC--3029387.vcf.gz.tbi jobid: 105 reason: Missing output files: /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/QC--3029387.vcf.gz wildcards: vcf=QC, rna=3029387 resources: tmpdir=/tmp

[Sun May 14 15:12:47 2023] Finished job 111. 1 of 109 steps (1%) done [Sun May 14 15:12:47 2023] Finished job 69. 2 of 109 steps (2%) done [Sun May 14 15:12:47 2023] Finished job 84. 3 of 109 steps (3%) done [Sun May 14 15:12:47 2023] Finished job 105. 4 of 109 steps (4%) done [Sun May 14 15:12:48 2023] Finished job 96. 5 of 109 steps (5%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2023-05-14T151231.826343.snakemake.log

Which just shows that I have an error but I cant pin point the exact error as it has no description (jobid: 52)

vyepez88 commented 1 year ago

there might be something wrong with this vcf file: 31800SL_S26-gatk-haplotype.vcf.gz make sure that the id inside the vcf file is the same one you specified in the DNA_ID column in the sample annotation

lbundalian commented 1 year ago

Hello. I have checked but I think it is still the same error:

Error in rule mae_createSNVs: jobid: 28 input: /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_NCBI_UCSC.txt, /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_UCSC_NCBI.txt, /work/users/pz192nijo/Projects/Schubert.DROP3/vcf/3021163-gatk-haplotype.vcf.gz, /work/users/pz192nijo/Projects/Schubert.DROP3/bam/SS_3032087_RAligned.sortedByCoord.out.bam, /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/MonoallelicExpression/pipeline/MAE/filterSNVs.sh, /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/params/snvs/3032087_snvParams.csv output: /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/3021163--3032087.vcf.gz, /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/3021163--3032087.vcf.gz.tbi shell:

    /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/MonoallelicExpression/pipeline/MAE/filterSNVs.sh /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_NCBI_UCSC.txt /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_UCSC_NCBI.txt /work/users/pz192nijo/Projects/Schubert.DROP3/vcf/3021163-gatk-haplotype.vcf.gz         3021163 /work/users/pz192nijo/Projects/Schubert.DROP3/bam/SS_3032087_RAligned.sortedByCoord.out.bam /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/3021163--3032087.vcf.gz         bcftools samtools

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Filter SNVs Failed to read from /work/users/pz192nijo/Projects/Schubert.DROP3/vcf/3022344-gatk-haplotype.vcf.gz: not compressed with bgzip Failed to read from standard input: unknown file type Failed to read from standard input: unknown file type Failed to read from standard input: unknown file type [Mon May 15 11:16:09 2023] Error in rule mae_createSNVs: jobid: 19 input: /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_NCBI_UCSC.txt, /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_UCSC_NCBI.txt, /work/users/pz192nijo/Projects/Schubert.DROP3/vcf/3022344-gatk-haplotype.vcf.gz, /work/users/pz192nijo/Projects/Schubert.DROP3/bam/SS_3030025_RAligned.sortedByCoord.out.bam, /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/MonoallelicExpression/pipeline/MAE/filterSNVs.sh, /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/params/snvs/3030025_snvParams.csv output: /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/3022344--3030025.vcf.gz, /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/3022344--3030025.vcf.gz.tbi shell:

    /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/MonoallelicExpression/pipeline/MAE/filterSNVs.sh /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_NCBI_UCSC.txt /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_UCSC_NCBI.txt /work/users/pz192nijo/Projects/Schubert.DROP3/vcf/3022344-gatk-haplotype.vcf.gz         3022344 /work/users/pz192nijo/Projects/Schubert.DROP3/bam/SS_3030025_RAligned.sortedByCoord.out.bam /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/3022344--3030025.vcf.gz         bcftools samtools

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
mincej commented 1 year ago

I had an error like this. There is a difference between gzipped and bgzipped files. The way I fixed this was uncompressing and recompressing in the requested format. Something like

gunzip file.vcf.gz
samtools bgzip file.vcf

https://github.com/samtools/bcftools/issues/668

lbundalian commented 1 year ago

ok, I have seen another one but it didnt work, so I will give this one a try. Thank you

lbundalian commented 1 year ago

image

Now the error is ike this upon updating the zip

vyepez88 commented 1 year ago

Hard to say, for some reason there's a formatting error in that vcf file. Did other samples run through?

lbundalian commented 1 year ago

None of them run. Now I am having problems related to this

ERROR: No allele-specific counts Make sure that the chromosome styles of the FASTA reference and BAM file match. If that isn't the issue, check that your VCF and BAM files are correctly formatted. If this problem persists and if this is your only sample causing issues, consider removing it from your analysis, as a last resort.

MAE ID: QC--3030732 VCF file: /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/QC--3030732.vcf.gz BAM file: /work/users/pz192nijo/Projects/Schubert.DROP3/bam/SS_3030732_RAligned.sortedByCoord.out.bam FASTA file: /work/users/pz192nijo/Database/GenomeDB/GRCh38/GRCh38.primary_assembly.genome.fa Additionally the ReadGroups may be poorly formed. Please refer to https://gagneurlab-drop.readthedocs.io/en/latest/help.html for more information [Wed May 17 07:29:04 2023] Error in rule mae_allelicCounts: jobid: 26 input: /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_NCBI_UCSC.txt, /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_UCSC_NCBI.txt, /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/QC--3030732.vcf.gz, /work/users/pz192nijo/Projects/Schubert.DROP3/bam/SS_3030732_RAligned.sortedByCoord.out.bam, /work/users/pz192nijo/Database/GenomeDB/GRCh38/GRCh38.primary_assembly.genome.fa, /work/users/pz192nijo/Database/GenomeDB/GRCh38/GRCh38.primary_assembly.genome.dict, /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/MonoallelicExpression/pipeline/MAE/ASEReadCounter.sh output: /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/allelic_counts/QC--3030732.csv.gz shell:

    /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/MonoallelicExpression/pipeline/MAE/ASEReadCounter.sh /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_NCBI_UCSC.txt /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_UCSC_NCBI.txt         /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/QC--3030732.vcf.gz /work/users/pz192nijo/Projects/Schubert.DROP3/bam/SS_3030732_RAligned.sortedByCoord.out.bam QC--3030732         /work/users/pz192nijo/Database/GenomeDB/GRCh38/GRCh38.primary_assembly.genome.fa True /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/allelic_counts/QC--3030732.csv.gz         bcftools samtools gatk

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job mae_allelicCounts since they might be corrupted: /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/allelic_counts/QC--3030732.csv.gz

vyepez88 commented 1 year ago

can you check that the chr styles of these files match:

VCF file: /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/QC--3030732.vcf.gz
BAM file: /work/users/pz192nijo/Projects/Schubert.DROP3/bam/SS_3030732_RAligned.sortedByCoord.out.bam
FASTA file: /work/users/pz192nijo/Database/GenomeDB/GRCh38/GRCh38.primary_assembly.genome.fa
lbundalian commented 1 year ago

This is my error now: image

lbundalian commented 1 year ago

Why would it be different if they have the same source of RNA_ID

vyepez88 commented 1 year ago

that problem arose due to either the RNA_IDs or the DNA_IDs being numeric. This is now fixed in the dev branch

vyepez88 commented 3 months ago

this is fixed now in the master branch 1.3.4