epi2me-labs / wf-transcriptomes

Other
64 stars 30 forks source link

Mismatch Error with Files #16

Closed topasnaa closed 11 months ago

topasnaa commented 1 year ago

What happened?

Hello, I have used oxford nanopore's direct cDNA sequencing kit to obtain transcript information from 18 different samples. Right now I am trying to run a differential expression analysis between 6 of them as set up below:

sample_id condition barcode01 untreated barcode02 untreated barcode03 untreated barcode10 treated barcode11 treated barcode12 treated

When I run the differential expression analysis pipeline it gives me the following error:

Join mismatch for the following entries:

This error repeats for all of my samples. I have checked the names and they are the same on the TSV as the files themselves. Does anyone know what might be causing this error?

Thank you!

Operating System

Windows 10

Workflow Execution

EPI2ME Labs desktop application

Workflow Execution - EPI2ME Labs Versions

EPI2ME Labs V4.1.3

Workflow Execution - CLI Execution Profile

None

Workflow Version

wf-transcriptomes

Relevant log output

Reference Transcriptome provided will be used for differential expression.
Checking fastq input.
Barcoded directories detected.
test
WARN: Access to undefined parameter `workDir` -- Initialise it to a default value eg. `params.workDir = some_value`
null
[9a/6914af] Submitted process > pipeline:summariseConcatReads (6)
[8f/57a726] Submitted process > pipeline:summariseConcatReads (3)
[67/c8a89b] Submitted process > pipeline:getVersions
[90/764215] Submitted process > pipeline:summariseConcatReads (5)
[f1/53efd2] Submitted process > pipeline:summariseConcatReads (1)
[98/ecb8f2] Submitted process > pipeline:summariseConcatReads (4)
[43/578a93] Submitted process > pipeline:summariseConcatReads (2)
[2b/089de0] Submitted process > pipeline:getParams
[5f/10c71c] Submitted process > pipeline:differential_expression:build_minimap_index_transcriptome
[55/6fbad0] Submitted process > pipeline:differential_expression:map_transcriptome (3)
[63/bfaa4c] Submitted process > pipeline:differential_expression:map_transcriptome (2)
Join mismatch for the following entries: 
- key= values=[],[],[],[],[],[] 
- key=barcode02 values=[/mnt/wsl/docker-desktop-bind-mounts/Ubuntu/dbb89c88207f1b960a480510825452f910bcfcda683e0bcf73b9f4774011e77a/instances/wf-transcriptomes_85450e8a-73c6-4b58-84dc-74a9e946f3d6/work/43/578a93037ce59c2d606c2226ed4601/barcode02.fastq] 
- key=barcode12 values=[/mnt/wsl/docker-desktop-bind-mounts/Ubuntu/dbb89c88207f1b960a480510825452f910bcfcda683e0bcf73b9f4774011e77a/instances/wf-transcriptomes_85450e8a-73c6-4b58-84dc-74a9e946f3d6/work/9a/6914afba6a575b0e45f1d8a74d4822/barcode12.fastq] 
- key=barcode01 values=[/mnt/wsl/docker-desktop-bind-mounts/Ubuntu/dbb89c88207f1b960a480510825452f910bcfcda683e0bcf73b9f4774011e77a/instances/wf-transcriptomes_85450e8a-73c6-4b58-84dc-74a9e946f3d6/work/f1/53efd25c5a0d2b553f0235260ed45f/barcode01.fastq] 
- key=barcode03 values=[/mnt/wsl/docker-desktop-bind-mounts/Ubuntu/dbb89c88207f1b960a480510825452f910bcfcda683e0bcf73b9f4774011e77a/instances/wf-transcriptomes_85450e8a-73c6-4b58-84dc-74a9e946f3d6/work/8f/57a726df0c1b02109a2ce647de8307/barcode03.fastq] 
- key=barcode11 values=[/mnt/wsl/docker-desktop-bind-mounts/Ubuntu/dbb89c88207f1b960a480510825452f910bcfcda683e0bcf73b9f4774011e77a/instances/wf-transcriptomes_85450e8a-73c6-4b58-84dc-74a9e946f3d6/work/90/764215eac117a4f49364d4c264b6ff/barcode11.fastq] 
- key=barcode10 values=[/mnt/wsl/docker-desktop-bind-mounts/Ubuntu/dbb89c88207f1b960a480510825452f910bcfcda683e0bcf73b9f4774011e77a/instances/wf-transcriptomes_85450e8a-73c6-4b58-84dc-74a9e946f3d6/work/98/ecb8f2175038ce6ec4b9f9eda66a05/barcode10.fastq]
WARN: Killing running tasks (2)
sarahjeeeze commented 1 year ago

Hi, sorry about that. The input condition sheet should actually be a csv. Let me know if it works with the condition sheet as - sample_id,condition barcode01,untreated barcode02,untreated barcode03,untreated barcode10,treated barcode11,treated barcode12,treated

I have since updated the documentation to reflect this.

Claudia-Stone commented 1 year ago

I have the same problem. I changed my condition sheet to csv, but I still got the "join mismatch" error.

sarahjeeeze commented 1 year ago

Hi, I've been unable to recreate the error other than if I use a tsv instead of csv with commas in it. Are you providing your new csv file to the --condition_sheet parameter? Does your error message look exactly the same as above?

Claudia-Stone commented 1 year ago

Sarah,

Thank you for getting back to me!

Using csv instead of tsv certainly helped, the process did go on for longer, but eventually, it still aborted, strangely with the same error message (copied below).

I wonder if there is a problem with my sample sheet, would you be able to share a sample sheet that works for you?

Below is an error message from a recent attempt. Please let me know if it would be helpful if i send any files,

Claudia

P.S. I cc-ed my student Tahmina who is also stuck

This is epi2me-labs/wf-transcriptomes v0.1.10. [2m-------------------------------------------------------------------------------- [0m Checking fastq input. Doing reference based transcript analysis [46/67f78b] Submitted process > pipeline:getVersions [42/523697] Submitted process > pipeline:getParams [b2/110741] Submitted process > validate_sample_sheet [8f/91837b] Submitted process > pipeline:build_minimap_index [fc/042631] Submitted process > fastcat (1) [94/631344] Submitted process > fastcat (6) [b7/0bff9e] Submitted process > fastcat (3) [cf/1b46e0] Submitted process > fastcat (2) [5f/2f1f02] Submitted process > fastcat (5) [22/198afb] Submitted process > fastcat (4) [3b/c47669] Submitted process > pipeline:collectFastqIngressResultsInDir (1) [aa/f33a47] Submitted process > pipeline:collectFastqIngressResultsInDir (2) [a5/74bd05] Submitted process > pipeline:collectFastqIngressResultsInDir (3) [84/eb16d7] Submitted process > pipeline:collectFastqIngressResultsInDir (4) [18/501f09] Submitted process > pipeline:collectFastqIngressResultsInDir (5) [9d/e63f24] Submitted process > output (2) [47/1f071a] Submitted process > output (1) [4d/b85d05] Submitted process > output (3) [0c/8a1c27] Submitted process > output (4) [72/bdf7d4] Submitted process > output (5) [90/b69bad] Submitted process > pipeline:preprocess_reads (1) Join mismatch for the following entries:

WARN: Killing running tasks (1)

On Sat, May 6, 2023 at 1:32 AM Sarah Griffiths @.***> wrote:

Hi, I've been unable to recreate the error other than if I use a tsv instead of csv with commas in it. Are you providing your new csv file to the --condition_sheet parameter? Does your error message look exactly the same as above?

— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/16#issuecomment-1537089943, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK3BRCQKRQJEACUQG4QI73XEYECJANCNFSM6AAAAAAWXZVCI4 . You are receiving this because you commented.Message ID: @.***>

sarahjeeeze commented 1 year ago

Hi, If you are using a sample sheet - the Alias's column in the sample sheet should match the sample_id column in the condition sheet. I can update the documentation to make this clearer.

Claudia-Stone commented 1 year ago

Thank you, Sarah!

For my sample sheet, I have columns for barcode, sample_id, and alias. What about a type column in the sample sheet?? I want to do differential expression, so I was thinking I would need a type "control" versus type "test_sample", but it does not seem to allow "control". It allows "positive_control", but technically, that's not what I would call my untreated samples. Do i use "trst_sample" for all, including control?

For my condition sheet, I have 2 columns, barcode and condition (treated, untreated), do I need anything else here?

Also, the run aborts now at the preprocessing with pychopper; are there any options that I should change for pychopper for mapping my cDNA to genomic DNA?

Claudia

On Tue, May 9, 2023 at 2:44 AM Sarah Griffiths @.***> wrote:

Hi, If you are using a sample sheet - the Alias's column in the sample sheet should match the sample_id column in the condition sheet. I can update the documentation to make this clearer.

— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/16#issuecomment-1539687732, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK3BRE6THXZHYVOSORC3ZDXFIGXZANCNFSM6AAAAAAWXZVCI4 . You are receiving this because you commented.Message ID: @.***>

sarahjeeeze commented 1 year ago

That should be fine for the sample_sheet and condition sheet, you don't need type for this workflow. Could you share the error message you get? You can find it by clicking on the logs tab in labs or clicking open instance in the overview tab and there should be a nextflow log/txt file.

Claudia-Stone commented 1 year ago

here is a recent nextflow.log file

On Tue, May 9, 2023 at 7:18 AM Sarah Griffiths @.***> wrote:

That should be fine for the sample_sheet and condition sheet, you don't need type for this workflow. Could you share the error message you get? You can find it by clicking on the logs tab in labs or clicking open instance in the overview tab and there should be a nextflow log/txt file.

— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/16#issuecomment-1540220893, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK3BRGDEGIEO77IPQ66XODXFJG4ZANCNFSM6AAAAAAWXZVCI4 . You are receiving this because you commented.Message ID: @.***>

Claudia-Stone commented 1 year ago

Sarah, Today, the pychopper seemed to havev worked, but the pipeline aborted at the next step, the RNA mapping.

Attached is a more recent nextflow.log

On Tue, May 9, 2023 at 8:04 AM Claudia Stone @.***> wrote:

here is a recent nextflow.log file

On Tue, May 9, 2023 at 7:18 AM Sarah Griffiths @.***> wrote:

That should be fine for the sample_sheet and condition sheet, you don't need type for this workflow. Could you share the error message you get? You can find it by clicking on the logs tab in labs or clicking open instance in the overview tab and there should be a nextflow log/txt file.

— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/16#issuecomment-1540220893, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK3BRGDEGIEO77IPQ66XODXFJG4ZANCNFSM6AAAAAAWXZVCI4 . You are receiving this because you commented.Message ID: @.***>

sarahjeeeze commented 1 year ago

Hi, I don't think I can access the log attachments but could you maybe just copy and paste the last lines or error message?

Claudia-Stone commented 1 year ago

-

This is epi2me-labs/wf-transcriptomes v0.1.10.


On Wed, May 10, 2023 at 1:05 AM Sarah Griffiths @.***> wrote:

Hi, I don't think I can access the log attachments but could you maybe just copy and paste the last lines or error message?

— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/16#issuecomment-1541537831, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK3BRB7S4IFJC7YESC5F4LXFND5XANCNFSM6AAAAAAWXZVCI4 . You are receiving this because you commented.Message ID: @.***>

Claudia-Stone commented 1 year ago

Sarah,

I also tried a different reference genome; it stops also at the RNA mapping, but with different error messages:

On Wed, May 10, 2023 at 7:19 AM Claudia Stone @.***> wrote:

-

This is epi2me-labs/wf-transcriptomes v0.1.10.


  • Checking fastq input.
  • Doing reference based transcript analysis
  • [dd/69341c] Cached process > pipeline:build_minimap_index
  • [03/d652cd] Cached process > validate_sample_sheet
  • [37/631a2b] Cached process > pipeline:getParams
  • [e1/ec2de5] Cached process > pipeline:getVersions
  • [f9/92ecc3] Cached process > fastcat (6)
  • [41/03d52b] Cached process > fastcat (1)
  • [b3/74b383] Cached process > fastcat (5)
  • [9a/2bae99] Cached process > fastcat (3)
  • [ea/05bacf] Cached process > fastcat (4)
  • [67/8cdc7e] Cached process > fastcat (2)
  • [bc/52e3ff] Cached process > pipeline:collectFastqIngressResultsInDir (4)
  • [c4/334181] Cached process > pipeline:collectFastqIngressResultsInDir (3)
  • [b2/1d6863] Cached process > pipeline:collectFastqIngressResultsInDir (1)
  • [6a/4a2ce4] Cached process > pipeline:preprocess_reads (2)
  • [26/afd032] Cached process > pipeline:collectFastqIngressResultsInDir (5)
  • [53/f2576a] Cached process > pipeline:collectFastqIngressResultsInDir (6)
  • [04/2ace23] Cached process > pipeline:collectFastqIngressResultsInDir (2)
  • [2d/fc6213] Cached process > pipeline:preprocess_reads (1)
  • [ed/88edb6] Cached process > pipeline:preprocess_reads (5)
  • [d3/fa64ec] Cached process > pipeline:preprocess_reads (6)
  • [43/171237] Cached process > pipeline:preprocess_reads (4)
  • [b0/83671f] Cached process > pipeline:preprocess_reads (3)
  • [82/a697c7] Cached process > output (4)
  • [7e/cd8748] Cached process > output (3)
  • [81/10a321] Cached process > output (2)
  • [0d/fcd1f1] Cached process > output (1)
  • [80/2adbdd] Cached process > output (5)
  • [56/c97b2f] Cached process > output (6)
  • [0d/b30f88] Cached process > pipeline:reference_assembly:map_reads (6)
  • [cf/f2b6a7] Submitted process > pipeline:reference_assembly:map_reads (1)
  • [68/851b16] Submitted process > pipeline:reference_assembly:map_reads (4)
  • [7b/8bb96b] Submitted process > pipeline:reference_assembly:map_reads (5)
  • Error executing process > 'pipeline:reference_assembly:map_reads (4)'
  • Caused by:
  • Process pipeline:reference_assembly:map_reads (4) terminated with an error exit status (1)
  • Command executed:
  • minimap2 -t 2 -ax splice -uf genome_index.mmi t10-2_full_length_reads.fastq | samtools view -q 40 -F 2304 -Sb - | seqkit bam -j 2 -x -T 'AlnContext: { Ref: "Lalbus-20171117r1.genome.fasta", LeftShift: -24,
  • RightShift: 24, RegexEnd: "[Aa]{8,}",
  • Stranded: True,Invert: True, Tsv: "internal_priming_fail.tsv"} ' - | samtools sort -@ 2 -o "t10-2_reads_aln_sorted.bam" - ;
  • ((cat "t10-2_reads_aln_sorted.bam" | seqkit bam -s -j 2 - 2>&1) | tee t10-2_read_aln_stats.tsv ) || true
  • if [[ -s "internal_priming_fail.tsv" ]];
  • then
  • tail -n +2 "internal_priming_fail.tsv" | awk '{print ">" $1 "\n" $4 }' - > "context_internal_priming_fail_start.fasta"
  • tail -n +2 "internal_priming_fail.tsv" | awk '{print ">" $1 "\n" $6 }' - > "context_internal_priming_fail_end.fasta"
  • fi
  • Command exit status:
  • 1
  • Command output:
  • (empty)
  • Command error:
  • [WARNING] Indexing parameters (-k, -w or -H) overridden by parameters used in the prebuilt index.
  • [M::main::29.238*0.43] loaded/built the index for 89 target sequence(s)
  • [M::mm_mapopt_update::35.301*0.52] mid_occ = 758
  • [M::mm_idx_stat] kmer size: 14; skip: 10; is_hpc: 0; #seq: 89
  • [M::mm_idx_stat::36.145*0.54] distinct minimizers: 17499227 (47.42% are singletons); average occurrences: 4.769; average spacing: 5.403; total length: 450972408
  • [main_samview] fail to read the header from "-".
  • panic: runtime error: invalid memory address or nil pointer dereference
  • [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x92ff8f]
  • goroutine 1 [running]:
  • github.com/biogo/hts/bam.(*Reader).Header(...)
  • /home/shenwei/shenwei/scripts/go/pkg/mod/ @.***/bam/reader.go:69
  • github.com/shenwei356/seqkit/v2/seqkit/cmd.BamToolbox({0x7ffd7777dc89 http://github.com/shenwei356/seqkit/v2/seqkit/cmd.BamToolbox(%7B0x7ffd7777dc89, 0xaa7b63}, {0x7ffd7777dd3c, 0x1}, {0xaa3520, 0x1}, 0x0, 0x0, 0x2)
  • /home/shenwei/shenwei/scripts/go/src/ github.com/shenwei356/seqkit/seqkit/cmd/bam_toolbox.go:187 +0x84f
  • github.com/shenwei356/seqkit/v2/seqkit/cmd.glob..func2(0x11a5f00, {0xc00006d3e0, 0x1, 0x6})
  • /home/shenwei/shenwei/scripts/go/src/ github.com/shenwei356/seqkit/seqkit/cmd/bam.go:574 +0x794
  • github.com/spf13/cobra.(*Command).execute(0x11a5f00, {0xc00006d380, 0x6, 0x6})
  • /home/shenwei/shenwei/scripts/go/pkg/mod/ @.***/command.go:860 +0x5f8
  • github.com/spf13/cobra.(*Command).ExecuteC(0x11a6180)
  • /home/shenwei/shenwei/scripts/go/pkg/mod/ @.***/command.go:974 +0x3bc
  • github.com/spf13/cobra.(*Command).Execute(...)
  • /home/shenwei/shenwei/scripts/go/pkg/mod/ @.***/command.go:902
  • github.com/shenwei356/seqkit/v2/seqkit/cmd.Execute()
  • /home/shenwei/shenwei/scripts/go/src/ github.com/shenwei356/seqkit/seqkit/cmd/root.go:63 +0x25
  • main.main()
  • /home/shenwei/shenwei/scripts/go/src/ github.com/shenwei356/seqkit/seqkit/main.go:58 +0x17
  • samtools sort: failed to read header from "-"
  • Work dir:
  • /Users/claudia/epi2melabs/instances/wf-transcriptomes_f55e1bed-43e5-4243-81aa-9916736b2f86/work/68/851b16faba91df1c30b2504917669d

  • Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line
  • WARN: Killing running tasks (2)

On Wed, May 10, 2023 at 1:05 AM Sarah Griffiths @.***> wrote:

Hi, I don't think I can access the log attachments but could you maybe just copy and paste the last lines or error message?

— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/16#issuecomment-1541537831, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK3BRB7S4IFJC7YESC5F4LXFND5XANCNFSM6AAAAAAWXZVCI4 . You are receiving this because you commented.Message ID: @.***>

sarahjeeeze commented 1 year ago

Hi, looks like its out of memory. With a full reference set this is a fairly memory intensive workflow. Do you have access to a server? You may want to read about setting docker resource limits section in WSL on https://labs.epi2me.io/nextflow-on-windows/ and increasing the memory to 10gb if possible, and then set that in the input form in EPI2ME Labs and see if it helps. You could also try setting the --minimap_index_opts to eg. -w 25 or -w 50.

Claudia-Stone commented 1 year ago

Sarah,

Thank you for your advice! I increased the swap memory on Docker, and got so much farther in the pipeline! I believe I'm almost there! But then it did not do the final steps. Attached are the sample and condition sheets I used.

Here is my error log from this morning: Workflow execution completed unsuccessfully!

The exit status of the task that caused the workflow execution to fail was: 1.

The full error message was:

Error executing process > 'pipeline:differential_expression:deAnalysis'

Caused by: Process pipeline:differential_expression:deAnalysis terminated with an error exit status (1)

Command executed:

cp stringtie.gtf annotation.gtf echo $(realpath annotation.gtf) echo Annotation$' 'min_samps_gene_expr$' 'min_samps_feature_expr$' 'min_gene_expr$' 'min_feature_expr

params.tsv echo $(realpath /Users/claudia/Nanopore_data/NEW_TRY/ref_genomes/Lupin_Genome_Xu/GCA_010261695.1_La_Amiga3.1_genomic.gtf)$' '3$' ' 3$' '3$' '3 >> params.tsv mkdir merged mkdir de_analysis mv all_counts.tsv merged/all_counts.tsv mv params.tsv de_analysis/de_params.tsv mv sample_condition_with_alias_from_web.csv de_analysis/coldata.tsv de_analysis.R

Command exit status: 1

Command output: annotation.gtf Loading counts, conditions and parameters. Loading annotation database. Filtering counts using DRIMSeq. barcode alias condition sample_id t0-1 barcode01 t0-1 untreated t0-1 t0-2 barcode05 t0-2 untreated t0-2 t0-3 barcode09 t0-3 untreated t0-3 t10-1 barcode02 t10-1 treated t10-1 t10-2 barcode06 t10-2 treated t10-2 t10-3 barcode10 t10-3 treated t10-3

Command error: Warning messages: 1: package 'S4Vectors' was built under R version 4.1.3 2: package 'IRanges' was built under R version 4.1.3 3: package 'GenomeInfoDb' was built under R version 4.1.3 4: package 'GenomicRanges' was built under R version 4.1.3 5: package 'AnnotationDbi' was built under R version 4.1.3 6: package 'Biobase' was built under R version 4.1.3 Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK 'select()' returned 1:many mapping between keys and columns Error in dmDSdata(counts = counts, samples = coldata) : all(samples$sample_id %in% colnames(counts)) is not TRUE Calls: dmDSdata -> stopifnot Execution halted

On Thu, May 11, 2023 at 2:29 AM Sarah Griffiths @.***> wrote:

Hi, looks like its out of memory. With a full reference set this is a fairly memory intensive workflow. Do you have access to a server? You may want to read about setting docker resource limits section in WSL on https://labs.epi2me.io/nextflow-on-windows/ and increasing the memory to 10gb if possible, and then set that in the input form in EPI2ME Labs and see if it helps.

— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/16#issuecomment-1543657594, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK3BREKYMNWGVJ7GKU2D6DXFSWRPANCNFSM6AAAAAAWXZVCI4 . You are receiving this because you commented.Message ID: @.***>

sarahjeeeze commented 1 year ago

Hi, I think it may be something to do with your sample_ids containing - as I could recreate the error this way. Is there a way to just remove those from the id's?

Claudia-Stone commented 1 year ago

Thanks Sarah for your fast response!

I removed the hyphens in the sample_ids and am already re-running...

On Thu, May 11, 2023 at 9:36 AM Sarah Griffiths @.***> wrote:

Hi, I think it may be something to do with your sample_ids containing - as I could recreate the error this way. Is there a way to just remove those from the id's?

— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/16#issuecomment-1544316747, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK3BREKUANCJWS4NNAOT73XFUIRTANCNFSM6AAAAAAWXZVCI4 . You are receiving this because you commented.Message ID: @.***>

Claudia-Stone commented 1 year ago

Sarah,

The pipeline worked almost to the end, I got normalized differential expression tables and plots, so this is great! The workflow did not create a final report, I post the error message below, and if you have any suggestion, please let me know. But otherwise, I'm happy with the results I got!

Thank you so much for all your help; without it I could not have figured this out!

Claudia

Workflow execution completed unsuccessfully!

The exit status of the task that caused the workflow execution to fail was: 1.

The full error message was:

Error executing process > 'pipeline:makeReport (1)'

Caused by: Process pipeline:makeReport (1) terminated with an error exit status (1)

Command executed:

if [ -f "de_report/OPTIONAL_FILE" ]; then dereport="" else dereport="--de_report true --de_stats "seqkit/"" mv de_report/.gtf de_report/stringtie_merged.gtf fi if [ -f "gff_annotation/OPTIONAL_FILE" ]; then OPT_GFF="" else OPT_GFF="--gffcompare_dir t03_gffcompare t103_gffcompare t101_gffcompare t102_gffcompare t01_gffcompare t02_gffcompare --gff_annotation gff_annotation/*"

fi if [ -f "jaffal_csv/OPTIONAL_FILE" ]; then OPT_JAFFAL_CSV="" else OPT_JAFFAL_CSV="--jaffal_csv jaffal_csv/" fi if [ -f "aln_stats/OPTIONAL_FILE" ]; then OPT_ALN="" else OPT_ALN="--alignment_stats aln_stats/" fi if [ -f "pychopper_report/OPTIONAL_FILE" ]; then OPT_PC_REPORT="" else OPT_PC_REPORT="--pychop_report pychopper_report/*" fi workflow-glue report --report wf-transcriptomes-report.html --versions versions.txt --params params.json $OPT_ALN $OPT_PC_REPORT --sample_ids t01 t102 t03 t103 t02 t101 --stats per-read-stats.tsv $OPT_GFF --isoform_table_nrows 5000 $OPT_JAFFAL_CSV $dereport

Command exit status: 1

Command output: (empty)

Command error: 51901 of 70136 (74%) 52602 of 70136 (75%) 53304 of 70136 (76%) 54005 of 70136 (77%) 54707 of 70136 (78%) 55408 of 70136 (79%) 56109 of 70136 (80%) 56811 of 70136 (81%) 57512 of 70136 (82%) 58213 of 70136 (83%) 58915 of 70136 (84%) 59616 of 70136 (85%) 60317 of 70136 (86%) 61019 of 70136 (87%) 61720 of 70136 (88%) 62422 of 70136 (89%) 63123 of 70136 (90%) 63824 of 70136 (91%) 64526 of 70136 (92%) 65227 of 70136 (93%) 65928 of 70136 (94%) 66630 of 70136 (95%) 67331 of 70136 (96%) 68032 of 70136 (97%) 68734 of 70136 (98%) /home/epi2melabs/conda/lib/python3.8/site-packages/pandas/util/_decorators.py:311: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'. return func(*args, **kwargs) Traceback (most recent call last): File "/Users/claudia/epi2melabs/workflows/epi2me-labs/wf-transcriptomes/bin/workflow-glue", line 7, in cli() File "/Users/claudia/epi2melabs/workflows/epi2me-labs/wf-transcriptomes/bin/workflow_glue/init.py", line 62, in cli args.func(args) File "/Users/claudia/epi2melabs/workflows/epi2me-labs/wf-transcriptomes/bin/workflow_glue/report.py", line 946, in main de_section(report) File "/Users/claudia/epi2melabs/workflows/epi2me-labs/wf-transcriptomes/bin/workflow_glue/report.py", line 892, in de_section de_plots.de_section( File "/Users/claudia/epi2melabs/workflows/epi2me-labs/wf-transcriptomes/bin/workflow_glue/de_plots.py", line 361, in de_section dtu_section(dtu, section, gene_txid, gene_name) File "/Users/claudia/epi2melabs/workflows/epi2me-labs/wf-transcriptomes/bin/workflow_glue/de_plots.py", line 223, in dtu_section dtu_results["gene_name"] = dtu_results["txID"].apply(lambda x: gt_dic[x]) File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/series.py", line 4357, in apply return SeriesApply(self, func, convert_dtype, args, kwargs).apply() File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/apply.py", line 1043, in apply return self.apply_standard() File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/core/apply.py", line 1098, in apply_standard mapped = lib.map_infer( File "pandas/_libs/lib.pyx", line 2859, in pandas._libs.lib.map_infer File "/Users/claudia/epi2melabs/workflows/epi2me-labs/wf-transcriptomes/bin/workflow_glue/de_plots.py", line 223, in dtu_results["gene_name"] = dtu_results["txID"].apply(lambda x: gt_dic[x]) KeyError: 'gnl|WGSJAAEJY|Lal_00046676.1'

Work dir: /Users/claudia/epi2melabs/instances/wf-transcriptomes_76ae40a5-b5e3-40de-9d56-341ce937d04b/work/e8/0078d049645174741bc9beee975142

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

On Thu, May 11, 2023 at 9:53 AM Claudia Stone @.***> wrote:

Thanks Sarah for your fast response!

I removed the hyphens in the sample_ids and am already re-running...

On Thu, May 11, 2023 at 9:36 AM Sarah Griffiths @.***> wrote:

Hi, I think it may be something to do with your sample_ids containing - as I could recreate the error this way. Is there a way to just remove those from the id's?

— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/16#issuecomment-1544316747, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK3BREKUANCJWS4NNAOT73XFUIRTANCNFSM6AAAAAAWXZVCI4 . You are receiving this because you commented.Message ID: @.***>

sarahjeeeze commented 1 year ago

Ah hopefully we can get the report to work, does gnl|WGSJAAEJY|Lal_00046676.1 look like an expected transcript ID from your dataset? It looks a bit suspicious, would you be able to share the reference_annotation input file you are using if its a public one, or if not maybe check it follows the gtf or gff specification. https://www.ensembl.org/info/website/upload/gff3.html

Claudia-Stone commented 1 year ago

Thanks Sarah. It is a public reference annotation that I downloaded from NCBI. I used the gtf, but there is also a gff3 file; I will give that a try.

On Fri, May 12, 2023 at 1:14 AM Sarah Griffiths @.***> wrote:

Ah hopefully we can get the report to work, does gnl|WGSJAAEJY|Lal_00046676.1 look like an expected transcript ID from your dataset? It looks a bit suspicious, would you be able to share the reference_annotation input file you are using if its a public one, or if not maybe check it follows the gtf or gff specification. https://www.ensembl.org/info/website/upload/gff3.html

— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/16#issuecomment-1545357681, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK3BRGOPLX6HJ3GMDEEDRLXFXWOVANCNFSM6AAAAAAWXZVCI4 . You are receiving this because you commented.Message ID: @.***>

sarahjeeeze commented 1 year ago

Hi, I know it was a while ago now. But I since added a fix to the workflow so the latest version should now work with the NCBI type gtf/gff files. Thanks for your feedback.

Claudia-Stone commented 1 year ago

Awesome, I will give it a try!

On Fri, Jun 9, 2023 at 3:57 AM Sarah Griffiths @.***> wrote:

Hi, I know it was a while ago now. But I since added a fix to the workflow so the latest version should now work with the NCBI type gtf/gff files. Thanks for your feedback.

— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/16#issuecomment-1584387164, or unsubscribe https://github.com/notifications/unsubscribe-auth/APK3BRAYHZPOF7PDARXW4LLXKL6QDANCNFSM6AAAAAAWXZVCI4 . You are receiving this because you commented.Message ID: @.***>

DomPerignon3 commented 1 year ago

Hello, I'm having similar error codes.

Relevant log output: Jun.-19 11:40:29.581 [Actor Thread 23] DEBUG nextflow.Session - Session aborted -- Cause: Join mismatch for the following entries:

Thanks for your suggestions.

sarahjeeeze commented 1 year ago

Hi @DomPerignon3, your condition sheet needs to have a header sample_id,condition and I think only 6 samples will work currently. 3 treated, 3 untreated - Although I would be interested to see what error you get with18.

cmetadea commented 1 year ago

Hello, I also experience the same error. I already use the condition_sheet.csv as advised but it fails over and over again. OS used : macOS 13.1 Workflow execution : wf-transcriptome CLI I put the .fastq file in one folder

This is epi2me-labs/wf-transcriptomes v0.1.13-gcb73e0a.
--------------------------------------------------------------------------------
Checking fastq input.
[-        ] process > fastcat                        -
[-        ] process > pipeline:preprocess_ref_ann... -
executor >  local (1)
[-        ] process > fastcat                        [  0%] 0 of 1
[-        ] process > pipeline:preprocess_ref_ann... [  0%] 0 of 1
executor >  local (3)
[-        ] process > fastcat                        [  0%] 0 of 1
[62/a91b6a] process > pipeline:preprocess_ref_ann... [  0%] 0 of 1
executor >  local (3)
[-        ] process > fastcat                        [  0%] 0 of 1
[62/a91b6a] process > pipeline:preprocess_ref_ann... [  0%] 0 of 1
executor >  local (3)
[-        ] process > fastcat                        [  0%] 0 of 1
[62/a91b6a] process > pipeline:preprocess_ref_ann... [  0%] 0 of 1
executor >  local (3)
[-        ] process > fastcat                        [  0%] 0 of 1
[62/a91b6a] process > pipeline:preprocess_ref_ann... [100%] 1 of 1 ✔
executor >  local (4)
[4f/388005] process > fastcat (1)                    [  0%] 0 of 1
[62/a91b6a] process > pipeline:preprocess_ref_ann... [100%] 1 of 1 ✔
executor >  local (4)
[4f/388005] process > fastcat (1)                    [  0%] 0 of 1
[62/a91b6a] process > pipeline:preprocess_ref_ann... [100%] 1 of 1 ✔
executor >  local (5)
[4f/388005] process > fastcat (1)                    [100%] 1 of 1 ✔
[62/a91b6a] process > pipeline:preprocess_ref_ann... [100%] 1 of 1 ✔
executor >  local (5)
[4f/388005] process > fastcat (1)                    [100%] 1 of 1 ✔
[62/a91b6a] process > pipeline:preprocess_ref_ann... [100%] 1 of 1 ✔
[-        ] process > pipeline:collectFastqIngres... -
[e7/62d106] process > pipeline:getVersions           [100%] 1 of 1 ✔
[7e/4ba125] process > pipeline:getParams             [100%] 1 of 1 ✔
[-        ] process > pipeline:preprocess_reads      -
[f5/054c51] process > pipeline:build_minimap_inde... [  0%] 0 of 1
[-        ] process > pipeline:reference_assembly... -
[-        ] process > pipeline:split_bam             -
[-        ] process > pipeline:assemble_transcripts  -
[-        ] process > pipeline:merge_gff_bundles     -
[-        ] process > pipeline:run_gffcompare        -
executor >  local (5)
[4f/388005] process > fastcat (1)                    [100%] 1 of 1 ✔
[62/a91b6a] process > pipeline:preprocess_ref_ann... [100%] 1 of 1 ✔
[-        ] process > pipeline:collectFastqIngres... [  0%] 0 of 1
[e7/62d106] process > pipeline:getVersions           [100%] 1 of 1 ✔
[7e/4ba125] process > pipeline:getParams             [100%] 1 of 1 ✔
[-        ] process > pipeline:preprocess_reads      [  0%] 0 of 1
[f5/054c51] process > pipeline:build_minimap_inde... [  0%] 0 of 1
[-        ] process > pipeline:reference_assembly... -
[-        ] process > pipeline:split_bam             -
[-        ] process > pipeline:assemble_transcripts  -
[-        ] process > pipeline:merge_gff_bundles     -
[-        ] process > pipeline:run_gffcompare        -
executor >  local (5)
[4f/388005] process > fastcat (1)                    [100%] 1 of 1 ✔
[62/a91b6a] process > pipeline:preprocess_ref_ann... [100%] 1 of 1 ✔
[-        ] process > pipeline:collectFastqIngres... [  0%] 0 of 1
[e7/62d106] process > pipeline:getVersions           [100%] 1 of 1 ✔
[7e/4ba125] process > pipeline:getParams             [100%] 1 of 1 ✔
[-        ] process > pipeline:preprocess_reads      [  0%] 0 of 1
[f5/054c51] process > pipeline:build_minimap_inde... [  0%] 0 of 1
[-        ] process > pipeline:reference_assembly... -
[-        ] process > pipeline:split_bam             -
[-        ] process > pipeline:assemble_transcripts  -
[-        ] process > pipeline:merge_gff_bundles     -
[-        ] process > pipeline:run_gffcompare        -
[-        ] process > pipeline:get_transcriptome     -
[-        ] process > pipeline:merge_transcriptomes  -
[-        ] process > pipeline:differential_expre... -
[-        ] process > pipeline:differential_expre... -
[-        ] process > pipeline:differential_expre... -
[-        ] process > pipeline:differential_expre... -
[-        ] process > pipeline:differential_expre... -
[-        ] process > pipeline:differential_expre... -
[-        ] process > pipeline:differential_expre... -
[-        ] process > pipeline:makeReport            -
[-        ] process > output                         -
Join mismatch for the following entries: 
- key=barcode06 values=[] 
- key=barcode05 values=[] 
- key=barcode02 values=[] 
- key=barcode01 values=[] 
- key=barcode04 values=[] 
- key=barcode03 values=[] 
- key=reads values=[/Users/cmetadea/Documents/MacDocs/RNAseq-analysis/work/4f/3880057d077fb019ea2caa8eb59c7d/seqs.fastq.gz]

WARN: Killing running tasks (1)

Any suggestions? Thanks!

sarahjeeeze commented 1 year ago

Hi, did you also add a sample sheet with the same list of entries? We are updating the workflow in the near future to not use this condition sheet as it is causing confusion.

cmetadea commented 1 year ago

I did not add a sample sheet as I thought I don't need one (all .fastq are in one folder, no sub-directories), do I still need one? The documentation said "The sample sheet can be provided when the input data is a directory containing sub-directories with FASTQ file" ?

sarahjeeeze commented 1 year ago

Hi, Ah okay, no you don't need a sample_sheet but for the differential expression subworkflow you will need to put the fastq's in subdirectories named with eg. barcode01, and their respective fastq files in each directory. If your data is not demultiplexed you will need to do that with wf-demultiplex https://github.com/epi2me-labs/wf-demultiplex

cmetadea commented 1 year ago

I did try to put files into subdirectories. First error is because my laptop run on Apple silicon so I need to modify docker. After running again it bumped into another error as such

ERROR ~ Error executing process > 'pipeline:reference_assembly:map_reads (1)'

Caused by:
  Process `pipeline:reference_assembly:map_reads (1)` terminated with an error exit status (1)

Command executed:

  minimap2 -t 4 -ax splice -uf genome_index.mmi barcode05_full_length_reads.fastq        | samtools view -q 40 -F 2304 -Sb -        | seqkit bam -j 4 -x -T 'AlnContext: { Ref: "tn2_new.fasta", LeftShift: -24,
      RightShift: 24, RegexEnd: "[Aa]{8,}",
      Stranded: True,Invert: True, Tsv: "internal_priming_fail.tsv"} ' -        | samtools sort -@ 4 -o "barcode05_reads_aln_sorted.bam" - ;
  ((cat "barcode05_reads_aln_sorted.bam" | seqkit bam -s -j 4 - 2>&1)  | tee barcode05_read_aln_stats.tsv ) || true

  if [[ -s "internal_priming_fail.tsv" ]];
      then
          tail -n +2 "internal_priming_fail.tsv" | awk '{print ">" $1 "\n" $4 }' - > "context_internal_priming_fail_start.fasta"
          tail -n +2 "internal_priming_fail.tsv" | awk '{print ">" $1 "\n" $6 }' - > "context_internal_priming_fail_end.fasta"
  fi

Command exit status:
  1

Command output:
  (empty)

Command error:
  [WARNING] Indexing parameters (-k, -w or -H) overridden by parameters used in the prebuilt index.
  [M::main::0.096*0.77] loaded/built the index for 1 target sequence(s)
  [M::mm_mapopt_update::0.105*0.77] mid_occ = 10
  [M::mm_idx_stat] kmer size: 14; skip: 10; is_hpc: 0; #seq: 1
  [M::mm_idx_stat::0.108*0.77] distinct minimizers: 281156 (94.65% are singletons); average occurrences: 1.063; average spacing: 5.368; total length: 1604952
  [INFO] create FASTA index for tn2_new.fasta
  [ERRO] different line length in sequence: NZ_AP019730.1
  samtools sort: failed to read header from "-"

Work dir:
  /Users/cmetadea/Documents/MacDocs/RNAseq-analysis/work/94/6ec641cc5a59a6b8c97e022dfa0630

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details
WARN: Killing running tasks (1)

Any suggestions? Thanks!

sarahjeeeze commented 12 months ago

My guess is either not enough memory available or there are no alignments in that particular Fastq with the reference. How big is the reference file and how much memory do you have available?

sarahjeeeze commented 11 months ago

Closing through lack of response, and original issue here has been resolved as we have removed use of condition sheet. Feel free to open a new issue if required.

Claudia-Stone commented 6 months ago

sorry this took me so long, but yes, the workflow is working now for me with the NCBI gtf file