Closed topasnaa closed 11 months ago
Hi, sorry about that. The input condition sheet should actually be a csv. Let me know if it works with the condition sheet as - sample_id,condition barcode01,untreated barcode02,untreated barcode03,untreated barcode10,treated barcode11,treated barcode12,treated
I have since updated the documentation to reflect this.
I have the same problem. I changed my condition sheet to csv, but I still got the "join mismatch" error.
Hi, I've been unable to recreate the error other than if I use a tsv instead of csv with commas in it. Are you providing your new csv file to the --condition_sheet
parameter? Does your error message look exactly the same as above?
Sarah,
Thank you for getting back to me!
Using csv instead of tsv certainly helped, the process did go on for longer, but eventually, it still aborted, strangely with the same error message (copied below).
I wonder if there is a problem with my sample sheet, would you be able to share a sample sheet that works for you?
Below is an error message from a recent attempt. Please let me know if it would be helpful if i send any files,
Claudia
P.S. I cc-ed my student Tahmina who is also stuck
This is epi2me-labs/wf-transcriptomes v0.1.10. [2m-------------------------------------------------------------------------------- [0m Checking fastq input. Doing reference based transcript analysis [46/67f78b] Submitted process > pipeline:getVersions [42/523697] Submitted process > pipeline:getParams [b2/110741] Submitted process > validate_sample_sheet [8f/91837b] Submitted process > pipeline:build_minimap_index [fc/042631] Submitted process > fastcat (1) [94/631344] Submitted process > fastcat (6) [b7/0bff9e] Submitted process > fastcat (3) [cf/1b46e0] Submitted process > fastcat (2) [5f/2f1f02] Submitted process > fastcat (5) [22/198afb] Submitted process > fastcat (4) [3b/c47669] Submitted process > pipeline:collectFastqIngressResultsInDir (1) [aa/f33a47] Submitted process > pipeline:collectFastqIngressResultsInDir (2) [a5/74bd05] Submitted process > pipeline:collectFastqIngressResultsInDir (3) [84/eb16d7] Submitted process > pipeline:collectFastqIngressResultsInDir (4) [18/501f09] Submitted process > pipeline:collectFastqIngressResultsInDir (5) [9d/e63f24] Submitted process > output (2) [47/1f071a] Submitted process > output (1) [4d/b85d05] Submitted process > output (3) [0c/8a1c27] Submitted process > output (4) [72/bdf7d4] Submitted process > output (5) [90/b69bad] Submitted process > pipeline:preprocess_reads (1) Join mismatch for the following entries:
key=barcode06 values=[]
key=barcode05 values=[]
key=t10-3 values=[/Users/claudia/epi2melabs/instances/wf-transcriptomes_b0fcfdec-395a-4ac1-8423-6997452a553d/work/94/6313441ae4df2f7a3a392edb18c9d2/seqs.fastq.gz]
key=barcode02 values=[]
key=t0-3 values=[/Users/claudia/epi2melabs/instances/wf-transcriptomes_b0fcfdec-395a-4ac1-8423-6997452a553d/work/b7/0bff9ec0b8407d02592f27a27fe66b/seqs.fastq.gz]
key=barcode01 values=[]
key=t0-2 values=[/Users/claudia/epi2melabs/instances/wf-transcriptomes_b0fcfdec-395a-4ac1-8423-6997452a553d/work/cf/1b46e00cbeaaf73c38f7d98575bdfc/seqs.fastq.gz]
key=t0-1 values=[/Users/claudia/epi2melabs/instances/wf-transcriptomes_b0fcfdec-395a-4ac1-8423-6997452a553d/work/fc/0426311d43f0af95bb534f45272ceb/seqs.fastq.gz]
key=barcode10 values=[]
key=barcode09 values=[] (more omitted)
WARN: Killing running tasks (1)
On Sat, May 6, 2023 at 1:32 AM Sarah Griffiths @.***> wrote:
Hi, I've been unable to recreate the error other than if I use a tsv instead of csv with commas in it. Are you providing your new csv file to the --condition_sheet parameter? Does your error message look exactly the same as above?
— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/16#issuecomment-1537089943, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK3BRCQKRQJEACUQG4QI73XEYECJANCNFSM6AAAAAAWXZVCI4 . You are receiving this because you commented.Message ID: @.***>
Hi, If you are using a sample sheet - the Alias's column in the sample sheet should match the sample_id column in the condition sheet. I can update the documentation to make this clearer.
Thank you, Sarah!
For my sample sheet, I have columns for barcode, sample_id, and alias. What about a type column in the sample sheet?? I want to do differential expression, so I was thinking I would need a type "control" versus type "test_sample", but it does not seem to allow "control". It allows "positive_control", but technically, that's not what I would call my untreated samples. Do i use "trst_sample" for all, including control?
For my condition sheet, I have 2 columns, barcode and condition (treated, untreated), do I need anything else here?
Also, the run aborts now at the preprocessing with pychopper; are there any options that I should change for pychopper for mapping my cDNA to genomic DNA?
Claudia
On Tue, May 9, 2023 at 2:44 AM Sarah Griffiths @.***> wrote:
Hi, If you are using a sample sheet - the Alias's column in the sample sheet should match the sample_id column in the condition sheet. I can update the documentation to make this clearer.
— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/16#issuecomment-1539687732, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK3BRE6THXZHYVOSORC3ZDXFIGXZANCNFSM6AAAAAAWXZVCI4 . You are receiving this because you commented.Message ID: @.***>
That should be fine for the sample_sheet and condition sheet, you don't need type for this workflow. Could you share the error message you get? You can find it by clicking on the logs tab in labs or clicking open instance in the overview tab and there should be a nextflow log/txt file.
here is a recent nextflow.log file
On Tue, May 9, 2023 at 7:18 AM Sarah Griffiths @.***> wrote:
That should be fine for the sample_sheet and condition sheet, you don't need type for this workflow. Could you share the error message you get? You can find it by clicking on the logs tab in labs or clicking open instance in the overview tab and there should be a nextflow log/txt file.
— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/16#issuecomment-1540220893, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK3BRGDEGIEO77IPQ66XODXFJG4ZANCNFSM6AAAAAAWXZVCI4 . You are receiving this because you commented.Message ID: @.***>
Sarah, Today, the pychopper seemed to havev worked, but the pipeline aborted at the next step, the RNA mapping.
Attached is a more recent nextflow.log
On Tue, May 9, 2023 at 8:04 AM Claudia Stone @.***> wrote:
here is a recent nextflow.log file
On Tue, May 9, 2023 at 7:18 AM Sarah Griffiths @.***> wrote:
That should be fine for the sample_sheet and condition sheet, you don't need type for this workflow. Could you share the error message you get? You can find it by clicking on the logs tab in labs or clicking open instance in the overview tab and there should be a nextflow log/txt file.
— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/16#issuecomment-1540220893, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK3BRGDEGIEO77IPQ66XODXFJG4ZANCNFSM6AAAAAAWXZVCI4 . You are receiving this because you commented.Message ID: @.***>
Hi, I don't think I can access the log attachments but could you maybe just copy and paste the last lines or error message?
-
pipeline:reference_assembly:map_reads (4)
terminated with an
error exit status (1)"context_internal_priming_fail_start.fasta"
"context_internal_priming_fail_end.fasta"
/Users/claudia/epi2melabs/instances/wf-transcriptomes_f55e1bed-43e5-4243-81aa-9916736b2f86/work/68/851b16faba91df1c30b2504917669d
-resume
to the run command lineOn Wed, May 10, 2023 at 1:05 AM Sarah Griffiths @.***> wrote:
Hi, I don't think I can access the log attachments but could you maybe just copy and paste the last lines or error message?
— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/16#issuecomment-1541537831, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK3BRB7S4IFJC7YESC5F4LXFND5XANCNFSM6AAAAAAWXZVCI4 . You are receiving this because you commented.Message ID: @.***>
Sarah,
I also tried a different reference genome; it stops also at the RNA mapping, but with different error messages:
pipeline:reference_assembly:map_reads (2)
terminated with an
error exit status (137)"context_internal_priming_fail_start.fasta"
"context_internal_priming_fail_end.fasta"
/Users/claudia/epi2melabs/instances/wf-transcriptomes_76ae40a5-b5e3-40de-9d56-341ce937d04b/work/a6/9e4b1ca744ef7ecc919fcf9933904c
bash .command.run
On Wed, May 10, 2023 at 7:19 AM Claudia Stone @.***> wrote:
-
This is epi2me-labs/wf-transcriptomes v0.1.10.
- Checking fastq input.
- Doing reference based transcript analysis
- [dd/69341c] Cached process > pipeline:build_minimap_index
- [03/d652cd] Cached process > validate_sample_sheet
- [37/631a2b] Cached process > pipeline:getParams
- [e1/ec2de5] Cached process > pipeline:getVersions
- [f9/92ecc3] Cached process > fastcat (6)
- [41/03d52b] Cached process > fastcat (1)
- [b3/74b383] Cached process > fastcat (5)
- [9a/2bae99] Cached process > fastcat (3)
- [ea/05bacf] Cached process > fastcat (4)
- [67/8cdc7e] Cached process > fastcat (2)
- [bc/52e3ff] Cached process > pipeline:collectFastqIngressResultsInDir (4)
- [c4/334181] Cached process > pipeline:collectFastqIngressResultsInDir (3)
- [b2/1d6863] Cached process > pipeline:collectFastqIngressResultsInDir (1)
- [6a/4a2ce4] Cached process > pipeline:preprocess_reads (2)
- [26/afd032] Cached process > pipeline:collectFastqIngressResultsInDir (5)
- [53/f2576a] Cached process > pipeline:collectFastqIngressResultsInDir (6)
- [04/2ace23] Cached process > pipeline:collectFastqIngressResultsInDir (2)
- [2d/fc6213] Cached process > pipeline:preprocess_reads (1)
- [ed/88edb6] Cached process > pipeline:preprocess_reads (5)
- [d3/fa64ec] Cached process > pipeline:preprocess_reads (6)
- [43/171237] Cached process > pipeline:preprocess_reads (4)
- [b0/83671f] Cached process > pipeline:preprocess_reads (3)
- [82/a697c7] Cached process > output (4)
- [7e/cd8748] Cached process > output (3)
- [81/10a321] Cached process > output (2)
- [0d/fcd1f1] Cached process > output (1)
- [80/2adbdd] Cached process > output (5)
- [56/c97b2f] Cached process > output (6)
- [0d/b30f88] Cached process > pipeline:reference_assembly:map_reads (6)
- [cf/f2b6a7] Submitted process > pipeline:reference_assembly:map_reads (1)
- [68/851b16] Submitted process > pipeline:reference_assembly:map_reads (4)
- [7b/8bb96b] Submitted process > pipeline:reference_assembly:map_reads (5)
- Error executing process > 'pipeline:reference_assembly:map_reads (4)'
- Caused by:
- Process
pipeline:reference_assembly:map_reads (4)
terminated with an error exit status (1)- Command executed:
- minimap2 -t 2 -ax splice -uf genome_index.mmi t10-2_full_length_reads.fastq | samtools view -q 40 -F 2304 -Sb - | seqkit bam -j 2 -x -T 'AlnContext: { Ref: "Lalbus-20171117r1.genome.fasta", LeftShift: -24,
- RightShift: 24, RegexEnd: "[Aa]{8,}",
- Stranded: True,Invert: True, Tsv: "internal_priming_fail.tsv"} ' - | samtools sort -@ 2 -o "t10-2_reads_aln_sorted.bam" - ;
- ((cat "t10-2_reads_aln_sorted.bam" | seqkit bam -s -j 2 - 2>&1) | tee t10-2_read_aln_stats.tsv ) || true
- if [[ -s "internal_priming_fail.tsv" ]];
- then
- tail -n +2 "internal_priming_fail.tsv" | awk '{print ">" $1 "\n" $4 }' - > "context_internal_priming_fail_start.fasta"
- tail -n +2 "internal_priming_fail.tsv" | awk '{print ">" $1 "\n" $6 }' - > "context_internal_priming_fail_end.fasta"
- fi
- Command exit status:
- 1
- Command output:
- (empty)
- Command error:
- [WARNING] Indexing parameters (-k, -w or -H) overridden by parameters used in the prebuilt index.
- [M::main::29.238*0.43] loaded/built the index for 89 target sequence(s)
- [M::mm_mapopt_update::35.301*0.52] mid_occ = 758
- [M::mm_idx_stat] kmer size: 14; skip: 10; is_hpc: 0; #seq: 89
- [M::mm_idx_stat::36.145*0.54] distinct minimizers: 17499227 (47.42% are singletons); average occurrences: 4.769; average spacing: 5.403; total length: 450972408
- [main_samview] fail to read the header from "-".
- panic: runtime error: invalid memory address or nil pointer dereference
- [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x92ff8f]
- goroutine 1 [running]:
- github.com/biogo/hts/bam.(*Reader).Header(...)
- /home/shenwei/shenwei/scripts/go/pkg/mod/ @.***/bam/reader.go:69
- github.com/shenwei356/seqkit/v2/seqkit/cmd.BamToolbox({0x7ffd7777dc89 http://github.com/shenwei356/seqkit/v2/seqkit/cmd.BamToolbox(%7B0x7ffd7777dc89, 0xaa7b63}, {0x7ffd7777dd3c, 0x1}, {0xaa3520, 0x1}, 0x0, 0x0, 0x2)
- /home/shenwei/shenwei/scripts/go/src/ github.com/shenwei356/seqkit/seqkit/cmd/bam_toolbox.go:187 +0x84f
- github.com/shenwei356/seqkit/v2/seqkit/cmd.glob..func2(0x11a5f00, {0xc00006d3e0, 0x1, 0x6})
- /home/shenwei/shenwei/scripts/go/src/ github.com/shenwei356/seqkit/seqkit/cmd/bam.go:574 +0x794
- github.com/spf13/cobra.(*Command).execute(0x11a5f00, {0xc00006d380, 0x6, 0x6})
- /home/shenwei/shenwei/scripts/go/pkg/mod/ @.***/command.go:860 +0x5f8
- github.com/spf13/cobra.(*Command).ExecuteC(0x11a6180)
- /home/shenwei/shenwei/scripts/go/pkg/mod/ @.***/command.go:974 +0x3bc
- github.com/spf13/cobra.(*Command).Execute(...)
- /home/shenwei/shenwei/scripts/go/pkg/mod/ @.***/command.go:902
- github.com/shenwei356/seqkit/v2/seqkit/cmd.Execute()
- /home/shenwei/shenwei/scripts/go/src/ github.com/shenwei356/seqkit/seqkit/cmd/root.go:63 +0x25
- main.main()
- /home/shenwei/shenwei/scripts/go/src/ github.com/shenwei356/seqkit/seqkit/main.go:58 +0x17
- samtools sort: failed to read header from "-"
- Work dir:
/Users/claudia/epi2melabs/instances/wf-transcriptomes_f55e1bed-43e5-4243-81aa-9916736b2f86/work/68/851b16faba91df1c30b2504917669d
- Tip: when you have fixed the problem you can continue the execution adding the option
-resume
to the run command line- WARN: Killing running tasks (2)
On Wed, May 10, 2023 at 1:05 AM Sarah Griffiths @.***> wrote:
Hi, I don't think I can access the log attachments but could you maybe just copy and paste the last lines or error message?
— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/16#issuecomment-1541537831, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK3BRB7S4IFJC7YESC5F4LXFND5XANCNFSM6AAAAAAWXZVCI4 . You are receiving this because you commented.Message ID: @.***>
Hi, looks like its out of memory. With a full reference set this is a fairly memory intensive workflow. Do you have access to a server? You may want to read about setting docker resource limits section in WSL on https://labs.epi2me.io/nextflow-on-windows/ and increasing the memory to 10gb if possible, and then set that in the input form in EPI2ME Labs and see if it helps. You could also try setting the --minimap_index_opts
to eg. -w 25 or -w 50.
Sarah,
Thank you for your advice! I increased the swap memory on Docker, and got so much farther in the pipeline! I believe I'm almost there! But then it did not do the final steps. Attached are the sample and condition sheets I used.
Here is my error log from this morning: Workflow execution completed unsuccessfully!
The exit status of the task that caused the workflow execution to fail was: 1.
The full error message was:
Error executing process > 'pipeline:differential_expression:deAnalysis'
Caused by:
Process pipeline:differential_expression:deAnalysis
terminated
with an error exit status (1)
Command executed:
cp stringtie.gtf annotation.gtf echo $(realpath annotation.gtf) echo Annotation$' 'min_samps_gene_expr$' 'min_samps_feature_expr$' 'min_gene_expr$' 'min_feature_expr
params.tsv echo $(realpath /Users/claudia/Nanopore_data/NEW_TRY/ref_genomes/Lupin_Genome_Xu/GCA_010261695.1_La_Amiga3.1_genomic.gtf)$' '3$' ' 3$' '3$' '3 >> params.tsv mkdir merged mkdir de_analysis mv all_counts.tsv merged/all_counts.tsv mv params.tsv de_analysis/de_params.tsv mv sample_condition_with_alias_from_web.csv de_analysis/coldata.tsv de_analysis.R
Command exit status: 1
Command output: annotation.gtf Loading counts, conditions and parameters. Loading annotation database. Filtering counts using DRIMSeq. barcode alias condition sample_id t0-1 barcode01 t0-1 untreated t0-1 t0-2 barcode05 t0-2 untreated t0-2 t0-3 barcode09 t0-3 untreated t0-3 t10-1 barcode02 t10-1 treated t10-1 t10-2 barcode06 t10-2 treated t10-2 t10-3 barcode10 t10-3 treated t10-3
Command error: Warning messages: 1: package 'S4Vectors' was built under R version 4.1.3 2: package 'IRanges' was built under R version 4.1.3 3: package 'GenomeInfoDb' was built under R version 4.1.3 4: package 'GenomicRanges' was built under R version 4.1.3 5: package 'AnnotationDbi' was built under R version 4.1.3 6: package 'Biobase' was built under R version 4.1.3 Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK 'select()' returned 1:many mapping between keys and columns Error in dmDSdata(counts = counts, samples = coldata) : all(samples$sample_id %in% colnames(counts)) is not TRUE Calls: dmDSdata -> stopifnot Execution halted
On Thu, May 11, 2023 at 2:29 AM Sarah Griffiths @.***> wrote:
Hi, looks like its out of memory. With a full reference set this is a fairly memory intensive workflow. Do you have access to a server? You may want to read about setting docker resource limits section in WSL on https://labs.epi2me.io/nextflow-on-windows/ and increasing the memory to 10gb if possible, and then set that in the input form in EPI2ME Labs and see if it helps.
— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/16#issuecomment-1543657594, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK3BREKYMNWGVJ7GKU2D6DXFSWRPANCNFSM6AAAAAAWXZVCI4 . You are receiving this because you commented.Message ID: @.***>
Hi, I think it may be something to do with your sample_ids containing -
as I could recreate the error this way. Is there a way to just remove those from the id's?
Thanks Sarah for your fast response!
I removed the hyphens in the sample_ids and am already re-running...
On Thu, May 11, 2023 at 9:36 AM Sarah Griffiths @.***> wrote:
Hi, I think it may be something to do with your sample_ids containing - as I could recreate the error this way. Is there a way to just remove those from the id's?
— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/16#issuecomment-1544316747, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK3BREKUANCJWS4NNAOT73XFUIRTANCNFSM6AAAAAAWXZVCI4 . You are receiving this because you commented.Message ID: @.***>
Sarah,
The pipeline worked almost to the end, I got normalized differential expression tables and plots, so this is great! The workflow did not create a final report, I post the error message below, and if you have any suggestion, please let me know. But otherwise, I'm happy with the results I got!
Thank you so much for all your help; without it I could not have figured this out!
Claudia
Workflow execution completed unsuccessfully!
The exit status of the task that caused the workflow execution to fail was: 1.
The full error message was:
Error executing process > 'pipeline:makeReport (1)'
Caused by:
Process pipeline:makeReport (1)
terminated with an error exit status (1)
Command executed:
if [ -f "de_report/OPTIONAL_FILE" ]; then dereport="" else dereport="--de_report true --de_stats "seqkit/"" mv de_report/.gtf de_report/stringtie_merged.gtf fi if [ -f "gff_annotation/OPTIONAL_FILE" ]; then OPT_GFF="" else OPT_GFF="--gffcompare_dir t03_gffcompare t103_gffcompare t101_gffcompare t102_gffcompare t01_gffcompare t02_gffcompare --gff_annotation gff_annotation/*"
fi if [ -f "jaffal_csv/OPTIONAL_FILE" ]; then OPT_JAFFAL_CSV="" else OPT_JAFFAL_CSV="--jaffal_csv jaffal_csv/" fi if [ -f "aln_stats/OPTIONAL_FILE" ]; then OPT_ALN="" else OPT_ALN="--alignment_stats aln_stats/" fi if [ -f "pychopper_report/OPTIONAL_FILE" ]; then OPT_PC_REPORT="" else OPT_PC_REPORT="--pychop_report pychopper_report/*" fi workflow-glue report --report wf-transcriptomes-report.html --versions versions.txt --params params.json $OPT_ALN $OPT_PC_REPORT --sample_ids t01 t102 t03 t103 t02 t101 --stats per-read-stats.tsv $OPT_GFF --isoform_table_nrows 5000 $OPT_JAFFAL_CSV $dereport
Command exit status: 1
Command output: (empty)
Command error: 51901 of 70136 (74%) 52602 of 70136 (75%) 53304 of 70136 (76%) 54005 of 70136 (77%) 54707 of 70136 (78%) 55408 of 70136 (79%) 56109 of 70136 (80%) 56811 of 70136 (81%) 57512 of 70136 (82%) 58213 of 70136 (83%) 58915 of 70136 (84%) 59616 of 70136 (85%) 60317 of 70136 (86%) 61019 of 70136 (87%) 61720 of 70136 (88%) 62422 of 70136 (89%) 63123 of 70136 (90%) 63824 of 70136 (91%) 64526 of 70136 (92%) 65227 of 70136 (93%) 65928 of 70136 (94%) 66630 of 70136 (95%) 67331 of 70136 (96%) 68032 of 70136 (97%) 68734 of 70136 (98%) /home/epi2melabs/conda/lib/python3.8/site-packages/pandas/util/_decorators.py:311: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'. return func(*args, **kwargs) Traceback (most recent call last): File "/Users/claudia/epi2melabs/workflows/epi2me-labs/wf-transcriptomes/bin/workflow-glue", line 7, in cli() File "/Users/claudia/epi2melabs/workflows/epi2me-labs/wf-transcriptomes/bin/workflow_glue/init.py", line 62, in cli args.func(args) File "/Users/claudia/epi2melabs/workflows/epi2me-labs/wf-transcriptomes/bin/workflow_glue/report.py", line 946, in main de_section(report) File "/Users/claudia/epi2melabs/workflows/epi2me-labs/wf-transcriptomes/bin/workflow_glue/report.py", line 892, in de_section de_plots.de_section( File "/Users/claudia/epi2melabs/workflows/epi2me-labs/wf-transcriptomes/bin/workflow_glue/de_plots.py", line 361, in de_section dtu_section(dtu, section, gene_txid, gene_name) File "/Users/claudia/epi2melabs/workflows/epi2me-labs/wf-transcriptomes/bin/workflow_glue/de_plots.py", line 223, in dtu_section dtu_results["gene_name"] = dtu_results["txID"].apply(lambda x: gt_dic[x]) File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/series.py", line 4357, in apply return SeriesApply(self, func, convert_dtype, args, kwargs).apply() File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/apply.py", line 1043, in apply return self.apply_standard() File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/apply.py", line 1098, in apply_standard mapped = lib.map_infer( File "pandas/_libs/lib.pyx", line 2859, in pandas._libs.lib.map_infer File "/Users/claudia/epi2melabs/workflows/epi2me-labs/wf-transcriptomes/bin/workflow_glue/de_plots.py", line 223, in dtu_results["gene_name"] = dtu_results["txID"].apply(lambda x: gt_dic[x]) KeyError: 'gnl|WGSJAAEJY|Lal_00046676.1'
Work dir: /Users/claudia/epi2melabs/instances/wf-transcriptomes_76ae40a5-b5e3-40de-9d56-341ce937d04b/work/e8/0078d049645174741bc9beee975142
Tip: you can replicate the issue by changing to the process work dir
and entering the command bash .command.run
On Thu, May 11, 2023 at 9:53 AM Claudia Stone @.***> wrote:
Thanks Sarah for your fast response!
I removed the hyphens in the sample_ids and am already re-running...
On Thu, May 11, 2023 at 9:36 AM Sarah Griffiths @.***> wrote:
Hi, I think it may be something to do with your sample_ids containing - as I could recreate the error this way. Is there a way to just remove those from the id's?
— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/16#issuecomment-1544316747, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK3BREKUANCJWS4NNAOT73XFUIRTANCNFSM6AAAAAAWXZVCI4 . You are receiving this because you commented.Message ID: @.***>
Ah hopefully we can get the report to work, does gnl|WGSJAAEJY|Lal_00046676.1
look like an expected transcript ID from your dataset? It looks a bit suspicious, would you be able to share the reference_annotation input file you are using if its a public one, or if not maybe check it follows the gtf or gff specification. https://www.ensembl.org/info/website/upload/gff3.html
Thanks Sarah. It is a public reference annotation that I downloaded from NCBI. I used the gtf, but there is also a gff3 file; I will give that a try.
On Fri, May 12, 2023 at 1:14 AM Sarah Griffiths @.***> wrote:
Ah hopefully we can get the report to work, does gnl|WGSJAAEJY|Lal_00046676.1 look like an expected transcript ID from your dataset? It looks a bit suspicious, would you be able to share the reference_annotation input file you are using if its a public one, or if not maybe check it follows the gtf or gff specification. https://www.ensembl.org/info/website/upload/gff3.html
— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/16#issuecomment-1545357681, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK3BRGOPLX6HJ3GMDEEDRLXFXWOVANCNFSM6AAAAAAWXZVCI4 . You are receiving this because you commented.Message ID: @.***>
Hi, I know it was a while ago now. But I since added a fix to the workflow so the latest version should now work with the NCBI type gtf/gff files. Thanks for your feedback.
Awesome, I will give it a try!
On Fri, Jun 9, 2023 at 3:57 AM Sarah Griffiths @.***> wrote:
Hi, I know it was a while ago now. But I since added a fix to the workflow so the latest version should now work with the NCBI type gtf/gff files. Thanks for your feedback.
— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/16#issuecomment-1584387164, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK3BRAYHZPOF7PDARXW4LLXKL6QDANCNFSM6AAAAAAWXZVCI4 . You are receiving this because you commented.Message ID: @.***>
Hello, I'm having similar error codes.
Relevant log output: Jun.-19 11:40:29.581 [Actor Thread 23] DEBUG nextflow.Session - Session aborted -- Cause: Join mismatch for the following entries:
Thanks for your suggestions.
Hi @DomPerignon3, your condition sheet needs to have a header sample_id,condition and I think only 6 samples will work currently. 3 treated, 3 untreated - Although I would be interested to see what error you get with18.
Hello, I also experience the same error. I already use the condition_sheet.csv as advised but it fails over and over again. OS used : macOS 13.1 Workflow execution : wf-transcriptome CLI I put the .fastq file in one folder
This is epi2me-labs/wf-transcriptomes v0.1.13-gcb73e0a.
--------------------------------------------------------------------------------
Checking fastq input.
[- ] process > fastcat -
[- ] process > pipeline:preprocess_ref_ann... -
executor > local (1)
[- ] process > fastcat [ 0%] 0 of 1
[- ] process > pipeline:preprocess_ref_ann... [ 0%] 0 of 1
executor > local (3)
[- ] process > fastcat [ 0%] 0 of 1
[62/a91b6a] process > pipeline:preprocess_ref_ann... [ 0%] 0 of 1
executor > local (3)
[- ] process > fastcat [ 0%] 0 of 1
[62/a91b6a] process > pipeline:preprocess_ref_ann... [ 0%] 0 of 1
executor > local (3)
[- ] process > fastcat [ 0%] 0 of 1
[62/a91b6a] process > pipeline:preprocess_ref_ann... [ 0%] 0 of 1
executor > local (3)
[- ] process > fastcat [ 0%] 0 of 1
[62/a91b6a] process > pipeline:preprocess_ref_ann... [100%] 1 of 1 ✔
executor > local (4)
[4f/388005] process > fastcat (1) [ 0%] 0 of 1
[62/a91b6a] process > pipeline:preprocess_ref_ann... [100%] 1 of 1 ✔
executor > local (4)
[4f/388005] process > fastcat (1) [ 0%] 0 of 1
[62/a91b6a] process > pipeline:preprocess_ref_ann... [100%] 1 of 1 ✔
executor > local (5)
[4f/388005] process > fastcat (1) [100%] 1 of 1 ✔
[62/a91b6a] process > pipeline:preprocess_ref_ann... [100%] 1 of 1 ✔
executor > local (5)
[4f/388005] process > fastcat (1) [100%] 1 of 1 ✔
[62/a91b6a] process > pipeline:preprocess_ref_ann... [100%] 1 of 1 ✔
[- ] process > pipeline:collectFastqIngres... -
[e7/62d106] process > pipeline:getVersions [100%] 1 of 1 ✔
[7e/4ba125] process > pipeline:getParams [100%] 1 of 1 ✔
[- ] process > pipeline:preprocess_reads -
[f5/054c51] process > pipeline:build_minimap_inde... [ 0%] 0 of 1
[- ] process > pipeline:reference_assembly... -
[- ] process > pipeline:split_bam -
[- ] process > pipeline:assemble_transcripts -
[- ] process > pipeline:merge_gff_bundles -
[- ] process > pipeline:run_gffcompare -
executor > local (5)
[4f/388005] process > fastcat (1) [100%] 1 of 1 ✔
[62/a91b6a] process > pipeline:preprocess_ref_ann... [100%] 1 of 1 ✔
[- ] process > pipeline:collectFastqIngres... [ 0%] 0 of 1
[e7/62d106] process > pipeline:getVersions [100%] 1 of 1 ✔
[7e/4ba125] process > pipeline:getParams [100%] 1 of 1 ✔
[- ] process > pipeline:preprocess_reads [ 0%] 0 of 1
[f5/054c51] process > pipeline:build_minimap_inde... [ 0%] 0 of 1
[- ] process > pipeline:reference_assembly... -
[- ] process > pipeline:split_bam -
[- ] process > pipeline:assemble_transcripts -
[- ] process > pipeline:merge_gff_bundles -
[- ] process > pipeline:run_gffcompare -
executor > local (5)
[4f/388005] process > fastcat (1) [100%] 1 of 1 ✔
[62/a91b6a] process > pipeline:preprocess_ref_ann... [100%] 1 of 1 ✔
[- ] process > pipeline:collectFastqIngres... [ 0%] 0 of 1
[e7/62d106] process > pipeline:getVersions [100%] 1 of 1 ✔
[7e/4ba125] process > pipeline:getParams [100%] 1 of 1 ✔
[- ] process > pipeline:preprocess_reads [ 0%] 0 of 1
[f5/054c51] process > pipeline:build_minimap_inde... [ 0%] 0 of 1
[- ] process > pipeline:reference_assembly... -
[- ] process > pipeline:split_bam -
[- ] process > pipeline:assemble_transcripts -
[- ] process > pipeline:merge_gff_bundles -
[- ] process > pipeline:run_gffcompare -
[- ] process > pipeline:get_transcriptome -
[- ] process > pipeline:merge_transcriptomes -
[- ] process > pipeline:differential_expre... -
[- ] process > pipeline:differential_expre... -
[- ] process > pipeline:differential_expre... -
[- ] process > pipeline:differential_expre... -
[- ] process > pipeline:differential_expre... -
[- ] process > pipeline:differential_expre... -
[- ] process > pipeline:differential_expre... -
[- ] process > pipeline:makeReport -
[- ] process > output -
Join mismatch for the following entries:
- key=barcode06 values=[]
- key=barcode05 values=[]
- key=barcode02 values=[]
- key=barcode01 values=[]
- key=barcode04 values=[]
- key=barcode03 values=[]
- key=reads values=[/Users/cmetadea/Documents/MacDocs/RNAseq-analysis/work/4f/3880057d077fb019ea2caa8eb59c7d/seqs.fastq.gz]
WARN: Killing running tasks (1)
Any suggestions? Thanks!
Hi, did you also add a sample sheet with the same list of entries? We are updating the workflow in the near future to not use this condition sheet as it is causing confusion.
I did not add a sample sheet as I thought I don't need one (all .fastq are in one folder, no sub-directories), do I still need one? The documentation said "The sample sheet can be provided when the input data is a directory containing sub-directories with FASTQ file" ?
Hi, Ah okay, no you don't need a sample_sheet but for the differential expression subworkflow you will need to put the fastq's in subdirectories named with eg. barcode01, and their respective fastq files in each directory. If your data is not demultiplexed you will need to do that with wf-demultiplex https://github.com/epi2me-labs/wf-demultiplex
I did try to put files into subdirectories. First error is because my laptop run on Apple silicon so I need to modify docker. After running again it bumped into another error as such
ERROR ~ Error executing process > 'pipeline:reference_assembly:map_reads (1)'
Caused by:
Process `pipeline:reference_assembly:map_reads (1)` terminated with an error exit status (1)
Command executed:
minimap2 -t 4 -ax splice -uf genome_index.mmi barcode05_full_length_reads.fastq | samtools view -q 40 -F 2304 -Sb - | seqkit bam -j 4 -x -T 'AlnContext: { Ref: "tn2_new.fasta", LeftShift: -24,
RightShift: 24, RegexEnd: "[Aa]{8,}",
Stranded: True,Invert: True, Tsv: "internal_priming_fail.tsv"} ' - | samtools sort -@ 4 -o "barcode05_reads_aln_sorted.bam" - ;
((cat "barcode05_reads_aln_sorted.bam" | seqkit bam -s -j 4 - 2>&1) | tee barcode05_read_aln_stats.tsv ) || true
if [[ -s "internal_priming_fail.tsv" ]];
then
tail -n +2 "internal_priming_fail.tsv" | awk '{print ">" $1 "\n" $4 }' - > "context_internal_priming_fail_start.fasta"
tail -n +2 "internal_priming_fail.tsv" | awk '{print ">" $1 "\n" $6 }' - > "context_internal_priming_fail_end.fasta"
fi
Command exit status:
1
Command output:
(empty)
Command error:
[WARNING] Indexing parameters (-k, -w or -H) overridden by parameters used in the prebuilt index.
[M::main::0.096*0.77] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.105*0.77] mid_occ = 10
[M::mm_idx_stat] kmer size: 14; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.108*0.77] distinct minimizers: 281156 (94.65% are singletons); average occurrences: 1.063; average spacing: 5.368; total length: 1604952
[INFO] create FASTA index for tn2_new.fasta
[ERRO] different line length in sequence: NZ_AP019730.1
samtools sort: failed to read header from "-"
Work dir:
/Users/cmetadea/Documents/MacDocs/RNAseq-analysis/work/94/6ec641cc5a59a6b8c97e022dfa0630
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
-- Check '.nextflow.log' file for details
WARN: Killing running tasks (1)
Any suggestions? Thanks!
My guess is either not enough memory available or there are no alignments in that particular Fastq with the reference. How big is the reference file and how much memory do you have available?
Closing through lack of response, and original issue here has been resolved as we have removed use of condition sheet. Feel free to open a new issue if required.
sorry this took me so long, but yes, the workflow is working now for me with the NCBI gtf file
What happened?
Hello, I have used oxford nanopore's direct cDNA sequencing kit to obtain transcript information from 18 different samples. Right now I am trying to run a differential expression analysis between 6 of them as set up below:
sample_id condition barcode01 untreated barcode02 untreated barcode03 untreated barcode10 treated barcode11 treated barcode12 treated
When I run the differential expression analysis pipeline it gives me the following error:
Join mismatch for the following entries:
This error repeats for all of my samples. I have checked the names and they are the same on the TSV as the files themselves. Does anyone know what might be causing this error?
Thank you!
Operating System
Windows 10
Workflow Execution
EPI2ME Labs desktop application
Workflow Execution - EPI2ME Labs Versions
EPI2ME Labs V4.1.3
Workflow Execution - CLI Execution Profile
None
Workflow Version
wf-transcriptomes
Relevant log output