Closed Jakob37 closed 5 months ago
Hi Jakob, I'm glad that you guys are trying to use tomte :) I will need some further information to help you debug:
Could you aslo try executing this from the folder where AE run?
snakemake --cores 1 sampleAnnotation
Hi Jakob, I'm glad that you guys are trying to use tomte :) I will need some further information to help you debug:
* what version of the pipeline are you running?
Looking at the CHANGELOG, the latest version with a filled in date is 2.0.1
Also to note, this is the first time poking around in DROP, so you might need to guide me further to get the information you want.
But, I'll do my best attempt first.
* Is the material from the DROP database you are using the same as that from your sample?
The txdb data used in the script at the top is indeed the same as that one used in the Nextflow, copied directly from the workfolder. Let me know if I misunderstood your question.
* How many samples do you have in the database? * Could you check if the count matrix is empty?
Hmm. There is one column in the counts variable, it that answers your question (not sure whether I can see number of samples in the txdb
).
> counts
class: RangedSummarizedExperiment
dim: 60662 1
metadata(0):
assays(1): counts
rownames(60662): ENSG00000000003.15 ENSG00000000005.6 ... ENSG00000288587.1 ENSG00000288588.1
rowData names(0):
colnames(1): sample_id
colData names(14): RNA_ID RNA_BAM_FILE ... GENE_ANNOTATION GENOME
For reference here is also the txdb
:
> txdb
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: gencode.v33.annotation.gtf
# Organism: NA
# Taxonomy ID: NA
# miRBase build ID: NA
# Genome: NA
# Nb of transcripts: 227912
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2024-06-18 09:57:30 +0200 (Tue, 18 Jun 2024)
# GenomicFeatures version at creation time: 1.52.1
# RSQLite version at creation time: 2.3.1
# DBSCHEMAVERSION: 1.2
* Could you check if the genome build, paired-end and strand configurations of the sample are the ones indicated in the sample annotation?
Can you specify what sample annotation you point to here, and how to access it?
Could you aslo try executing this from the folder where AE run? snakemake --cores 1 sampleAnnotation
Hmm. In the same working directory, I get the following error.
I am guessing this is a basic mistake on my part somewhere, due to not being used to Snakemake.
$ singularity exec /<path>/docker.io-clinicalgenomics-drop-1.3.3.img snakemake --cores 1 sampleAnnotation
FileExistsError in file /<work>/f2/6ed420caf4360c38e6e73a0ee4d1ef/Snakefile, line 12:
sample_annotation.tsv
File "/<work>/f2/6ed420caf4360c38e6e73a0ee4d1ef/Snakefile", line 12, in <module>
File "/opt/conda/lib/python3.11/site-packages/drop/config/DropConfig.py", line 31, in __init__
File "/opt/conda/lib/python3.11/site-packages/drop/config/DropConfig.py", line 127, in setDefaults
File "/opt/conda/lib/python3.11/site-packages/drop/utils.py", line 44, in checkKeys
This error persists also when moving the softlinked sample_annotation.tsv
which resides in the same folder as the Snakemake file. And after renaming the one softlinked. Not sure where it is finding it.
I'll have to leave this for now, but will come back to Tomte soon.
Hi again, Yes, it always takes some time to get used too :)
--reference_drop_annot_file
/path/Output/processed_data/aberrant_expression/{version}/outrider/outrider/total_counts.Rds
snakemake --cores 1 sampleAnnotation
command, could you check what files normal and hidden are there ls -a
?This error is still there for me when using a proper DROP reference dataset. I'll dig in a bit more and see if I can learn more about where the error is coming from.
Running a set of five samples, it did not crash in the same place (I expect it has to do with me running only one sample). But it crashes later instead, without a clear error status.
I see that version 1.3.3 of DROP is used. Would it be possible to update this to the latest version 1.4.0? Before spending time on debugging an outdated version.
Also, have you guys managed to run real samples through DROP in Tomte? I would be interested in seeing how your input files look, if you are able to share.
Here is the new error, still from the aberrant expression module:
Command output:
# miRBase build ID: NA
# Genome: NA
# Nb of transcripts: 227912
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2024-06-25 16:00:08 +0200 (Tue, 25 Jun 2024)
# GenomicFeatures version at creation time: 1.52.1
# RSQLite version at creation time: 2.3.1
# DBSCHEMAVERSION: 1.2
[1] TRUE
[1] "38.86742 mins"
[1] 61754677
[1] "39.26456 mins"
[1] 83414094
[1] "1.135816 hours"
[1] 140632607
[1] "1.227175 hours"
[1] 107577339
[1] "1.515051 hours"
[1] 173628357
[1] TRUE
1/13
2/13 [unnamed-chunk-1]
3/13
4/13 [unnamed-chunk-2]
5/13
6/13 [unnamed-chunk-3]
7/13 [meanCounts]
8/13
9/13 [unnamed-chunk-4]
10/13 [expressedGenes]
11/13
12/13 [unnamed-chunk-5]
[1] "Tue Jun 25 17:40:12 2024: Initial PCA loss: 7.69369834812004"
[1] "Tue Jun 25 17:40:15 2024: Iteration: 1 loss: 6.64874291425983"
[1] "Tue Jun 25 17:40:16 2024: Iteration: 2 loss: 6.62153902339786"
[1] "Tue Jun 25 17:40:17 2024: Iteration: 3 loss: 6.60042599653787"
[1] "Tue Jun 25 17:40:19 2024: Iteration: 4 loss: 6.58522274600442"
[1] "Tue Jun 25 17:40:20 2024: Iteration: 5 loss: 6.57475038989527"
[1] "Tue Jun 25 17:40:21 2024: Iteration: 6 loss: 6.56773156058167"
[1] "Tue Jun 25 17:40:23 2024: Iteration: 7 loss: 6.56299199071875"
[1] "Tue Jun 25 17:40:25 2024: Iteration: 8 loss: 6.55979920912951"
[1] "Tue Jun 25 17:40:26 2024: Iteration: 9 loss: 6.55764272057436"
[1] "Tue Jun 25 17:40:28 2024: Iteration: 10 loss: 6.55614316988807"
[1] "Tue Jun 25 17:40:29 2024: Iteration: 11 loss: 6.5550629135103"
[1] "Tue Jun 25 17:40:30 2024: Iteration: 12 loss: 6.55443956741798"
[1] "Tue Jun 25 17:40:31 2024: Iteration: 13 loss: 6.55400965973263"
[1] "Tue Jun 25 17:40:33 2024: Iteration: 14 loss: 6.5537021669489"
[1] "Tue Jun 25 17:40:34 2024: Iteration: 15 loss: 6.55350416866071"
Time difference of 21.12652 secs
[1] "Tue Jun 25 17:40:34 2024: 15 Final nb-AE loss: 6.55350416866071"
Command error:
Tue Jun 25 17:38:51 2024: Controlling for confounders ...
Tue Jun 25 17:38:51 2024: Using the autoencoder implementation for controlling.
[1] "Tue Jun 25 17:38:54 2024: Initial PCA loss: 7.63807922394072"
[1] "Tue Jun 25 17:39:05 2024: Iteration: 1 loss: 5.19196309491786"
[1] "Tue Jun 25 17:39:09 2024: Iteration: 2 loss: 5.19196303307466"
Tue Jun 25 17:39:09 2024: the AE correction converged with:5.19196303307466
Time difference of 8.645406 secs
[1] "Tue Jun 25 17:39:09 2024: 2 Final nb-AE loss: 5.19196303307466"
Tue Jun 25 17:39:10 2024: Used the autoencoder implementation for controlling.
Tue Jun 25 17:39:10 2024: P-value calculation ...
Tue Jun 25 17:39:10 2024: Zscore calculation ...
[1] "Evaluation loss: 0.0145868508237345 for q=5"
Tue Jun 25 17:38:51 2024: SizeFactor estimation ...
Tue Jun 25 17:38:52 2024: Controlling for confounders ...
Tue Jun 25 17:38:52 2024: Using the autoencoder implementation for controlling.
[1] "Tue Jun 25 17:38:54 2024: Initial PCA loss: 7.7262947121687"
[1] "Tue Jun 25 17:39:06 2024: Iteration: 1 loss: 6.70590027912551"
[1] "Tue Jun 25 17:39:11 2024: Iteration: 2 loss: 6.68004689941878"
[1] "Tue Jun 25 17:39:15 2024: Iteration: 3 loss: 6.66278146893879"
[1] "Tue Jun 25 17:39:20 2024: Iteration: 4 loss: 6.64749462141839"
[1] "Tue Jun 25 17:39:25 2024: Iteration: 5 loss: 6.63635757021486"
[1] "Tue Jun 25 17:39:29 2024: Iteration: 6 loss: 6.62818792786937"
[1] "Tue Jun 25 17:39:34 2024: Iteration: 7 loss: 6.62239444944122"
[1] "Tue Jun 25 17:39:39 2024: Iteration: 8 loss: 6.61818368093265"
[1] "Tue Jun 25 17:39:44 2024: Iteration: 9 loss: 6.61509587110696"
[1] "Tue Jun 25 17:39:49 2024: Iteration: 10 loss: 6.61291339770892"
[1] "Tue Jun 25 17:39:53 2024: Iteration: 11 loss: 6.611330508688"
[1] "Tue Jun 25 17:39:56 2024: Iteration: 12 loss: 6.6100949529569"
[1] "Tue Jun 25 17:40:00 2024: Iteration: 13 loss: 6.60930078454305"
[1] "Tue Jun 25 17:40:03 2024: Iteration: 14 loss: 6.60877931537462"
[1] "Tue Jun 25 17:40:07 2024: Iteration: 15 loss: 6.60838510676507"
Time difference of 1.108333 mins
[1] "Tue Jun 25 17:40:07 2024: 15 Final nb-AE loss: 6.60838510676507"
Tue Jun 25 17:40:07 2024: Used the autoencoder implementation for controlling.
Tue Jun 25 17:40:07 2024: P-value calculation ...
Tue Jun 25 17:40:08 2024: Zscore calculation ...
[1] "Evaluation loss: 0.0270543485425258 for q=2"
Tue Jun 25 17:40:08 2024: SizeFactor estimation ...
Tue Jun 25 17:40:08 2024: Controlling for confounders ...
Tue Jun 25 17:40:08 2024: Using the autoencoder implementation for controlling.
Tue Jun 25 17:40:34 2024: Used the autoencoder implementation for controlling.
outrider fitting finished
[Tue Jun 25 17:40:35 2024]
Finished job 21.
17 of 24 steps (71%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-06-25T155815.901121.snakemake.log
Work dir:
/<work>/tomte/a1/e3ca6945218579ebc8ac0a6ecc61ee
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
-- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting
-- Check '.nextflow.log' file for details
My sample_annotation.tsv
as seen in the work dir looks like the following (with much more control samples).
It might very well be that I have gotten something wrong here. Pointers are welcome.
RNA_ID RNA_BAM_FILE DNA_VCF_FILE DNA_ID DROP_GROUP PAIRED_END COUNT_MODE COUNT_OVERLAPS SPLICE_COUNTS_DIR STRAND HPO_TERMS GENE_COUNTS_FILE GENE_ANNOTATION GENOME INDIVIDUAL_ID TISSUE SEX STRAND_SPECIFIC
sample_1 sample_1.Aligned.sortedByCoord.out.bam NA NA outrider,fraser True IntersectionStrict True NA reverse NA NA NA NA NA NA NA NA
sample_2 sample_2.Aligned.sortedByCoord.out.bam NA NA outrider,fraser True IntersectionStrict True NA reverse NA NA NA NA NA NA NA NA
sample_5 sample_5.Aligned.sortedByCoord.out.bam NA NA outrider,fraser True IntersectionStrict True NA reverse NA NA NA NA NA NA NA NA
sample_4 sample_4.Aligned.sortedByCoord.out.bam NA NA outrider,fraser True IntersectionStrict True NA reverse NA NA NA NA NA NA NA NA
sample_3 sample_3.Aligned.sortedByCoord.out.bam NA NA outrider,fraser True IntersectionStrict True NA reverse NA NA NA NA NA NA NA NA
GTEX-111YS-0006-SM-5NQBE NA NA NA outrider,fraser True IntersectionStrict True Whole_Blood--GRCh38--gencode29 no NA geneCounts.tsv.gz NA NA GTEX-111YS Whole_Blood Male False
GTEX-1122O-0005-SM-5O99J NA NA NA outrider,fraser True IntersectionStrict True Whole_Blood--GRCh38--gencode29 no NA geneCounts.tsv.gz NA NA GTEX-1122O Whole_Blood Female False
GTEX-1128S-0005-SM-5P9HI NA NA NA outrider,fraser True IntersectionStrict True Whole_Blood--GRCh38--gencode29 no NA geneCounts.tsv.gz NA NA GTEX-1128S Whole_Blood Female False
GTEX-113IC-0006-SM-5NQ9C NA NA NA outrider,fraser True IntersectionStrict True Whole_Blood--GRCh38--gencode29 no NA geneCounts.tsv.gz NA NA GTEX-113IC Whole_Blood Male False
GTEX-113JC-0006-SM-5O997 NA NA NA outrider,fraser True IntersectionStrict True Whole_Blood--GRCh38--gencode29 no NA geneCounts.tsv.gz NA NA GTEX-113JC Whole_Blood Female False
...700+ more samples
Hi Jakob, did it run through AS but fail in AE or did it fail in both? Here is the documentation for running DROP within Tomte. Upon first site, remove the last column "STRAND_SPECIFIC" and see if it goes any better.
Yes, we have now run quite a few sample through the pipeline with Tomte, are the top 5 lines of the annotation file sampleAnnotation.txt we use, the samples 1-5 are from our local database (this is just the first lines). The samples run through Tomte are not in the annotation file because they will be added automatically by the pipeline.
We want to wait for a bit before changing to 1.4.0 as there might be patches in the coming weeks
Hi Jakob, did it run through AS but fail in AE or did it fail in both?
Not sure yet, I had a timeout after 16h for AS and will try to rerun. But seems like it comes pretty far into the AS at least.
Here is the documentation for running DROP within Tomte. Upon first site, remove the last column "STRAND_SPECIFIC" and see if it goes any better.
Thank you! Sorry about not reading this carefully before starting issues 🙈 I'll try removing the STRAND_SPECIFIC
column (this one is present in the originally downloaded data).
Yes, we have now run quite a few sample through the pipeline with Tomte, are the top 5 lines of the annotation file sampleAnnotation.txt we use, the samples 1-5 are from our local database (this is just the first lines). The samples run through Tomte are not in the annotation file because they will be added automatically by the pipeline.
OK great to know that you have managed to get a bunch of samples though. Then it sounds like some basic input mistakes on my side.
We want to wait for a bit before changing to 1.4.0 as there might be patches in the coming weeks
OK I understand!
Another question. When running, do you typically run several of your samples together, or have you also tried running single samples (i.e. one sample together with the group of control samples)?
We have, it should work
OK, I think I figured it out. I didn't correctly assign annotation files to the control samples. So when running, it retrieved only samples with the same annotation from existing samples, yielding only my actual sample. It started processing this one, but crashed in the steps above as it only had one sample to work with.
It would be helpful for DROP to throw an early error if detecting that there is only one sample ... But anyways, I got a good tour of the documentation and source code now. Thank you for the help on the way!
The AE module is still running after 16h though. Not sure if normal, but I think at least this issue is resolved.
Hi Jakob,
Good to hear that it is working :). if you are running whole blood, it is likely due to high haemoglobin content. I would advise you to remove it, you can do so by providing tomte with a bed containing all haemoglobin positions as --subsample_bed
.
Description of the bug
Running the Tomte pipeline on a single paired RNA-seq dataset, I run into a crash in the DROP modules.
I am running in the master branch.
Error pasted in the box below.
I have poked around a bit in the results. It is a bit tricky to debug, as it looks like a Snakemake pipeline inside the Nextflow pipeline. But I have started getting a grip of what is happening.
Inside the
filterCounts.R
script in theDROP_CONFIG_RUN_AE
process (in theaberrantExpression
Snakemake workflow).It seems to be crashing here:
Running this locally, I got the following stacktrace (and the same error message as before).
Looking inside it, the preceeding code looks as follows:
R has an unfortunate tendency to drop a matrix to a vector if only putting in one argument.
This would crash the
colSums
with the same errorSo my hypothesis is that the
filterExpression
command isn't built to run only a single sample. Should this be possible, or am I using the pipeline the wrong way? Let me know if I am wrong on the ball here!Command used and terminal output