epi2me-labs / wf-clone-validation

Other
23 stars 18 forks source link

[Bug]: BED format problem #24

Closed ChristopherRichie closed 10 months ago

ChristopherRichie commented 12 months ago

What happened?

A bug happened! when I run the test data, I get a bed file for barcode01.

barcode01 2913 4452 A -1 barcode01 2201 2657 D -1 barcode01 762 2046 F -1 barcode01 2085 2199 J -1 barcode01 124 649 G -1 barcode01 4518 4953 H -1 barcode01 2656 2914 C -1 barcode01 5231 5372 H -1

This bed file was kicked out by Snapgene, when I tried to use it to apply the bed file as annotated features. I think this is because of the "Fifth column", because when I remove that column, the bed file works.

what is the "Fifth column" supposed to indicate here?

thanks

Operating System

ubuntu 18.04

Workflow Execution

Command line

Workflow Execution - EPI2ME Labs Versions

NIH HPC Biowulf, running nextflow and singularity with --cpus-per-task=14 --mem=16g --gres=lscratch:400,gpu:v100:1

Workflow Execution - CLI Execution Profile

Singularity

Workflow Version

wf-clone-validation v0.3.1-g97e1eab

Relevant log output

I am not sure what the "relevant log output" would be for this.
I do not know if the BED files come from plannotate pipeline or from the "report" pipeline

Jul-07 16:02:10.600 [main] DEBUG nextflow.script.ProcessConfig - Config settings `withLabel:wfplasmid` matches labels `wfplasmid` for process with name pipeline:runPlannotate
Jul-07 16:02:10.601 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Jul-07 16:02:10.601 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jul-07 16:02:10.608 [main] DEBUG nextflow.script.ProcessConfig - Config settings `withLabel:wfplasmid` matches labels `wfplasmid` for process with name pipeline:inserts
Jul-07 16:02:10.609 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Jul-07 16:02:10.609 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jul-07 16:02:10.616 [main] DEBUG nextflow.script.ProcessConfig - Config settings `withLabel:wfplasmid` matches labels `wfplasmid` for process with name pipeline:report
Jul-07 16:02:10.617 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local

Jul-07 16:16:32.168 [main] DEBUG nextflow.trace.TimelineObserver - Workflow completed -- rendering execution timeline
Jul-07 16:16:32.309 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
Jul-07 16:16:32.328 [main] DEBUG nextflow.util.ThreadPoolManager - Thread pool 'FileTransfer' shutdown completed (hard=false)
Jul-07 16:16:32.329 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye
sarahjeeeze commented 12 months ago

Hi, the 5th column is the orientation of the gene/annotation. I will update the documentation to make this clear in the near future.

ChristopherRichie commented 12 months ago

Hi Sarah, Thanks for your reply.

According to http://genome.ucsc.edu/FAQ/FAQformat#format1, the fifth column is for "score", and the sixth column is for "strand".

The "-1" does not appear to be compatible with the BED.format. The "strand" column accepts {., +, or -}.

Does the sample.BED file come from the basecalling workflow "result" pipeline? If it comes from the the "plannotate" pipeline, then perhaps I have to direct this bug elsewhere.

Thanks Chris

From: Sarah Griffiths @.> Sent: Monday, July 10, 2023 9:27 AM To: epi2me-labs/wf-clone-validation @.> Cc: Richie, Christopher (NIH/NIDA) [E] @.>; Author @.> Subject: [EXTERNAL] Re: [epi2me-labs/wf-clone-validation] [Bug]: BED format problem (Issue #24)

Hi, the 5th column is the orientation of the gene/annotation. I will update the documentation to make this clear in the near future.

- Reply to this email directly, view it on GitHubhttps://github.com/epi2me-labs/wf-clone-validation/issues/24#issuecomment-1628956303, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AR4COQZSHDKYYI67G3L6ZILXPP7JHANCNFSM6AAAAAA2CIZBUA. You are receiving this because you authored the thread.Message ID: @.**@.>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and are confident the content is safe.

sarahjeeeze commented 12 months ago

Hi, thanks for pointing this out, You are correct the column is for strand and the -1's should be -. I'll get this updated. This bed file is just output by the workflow, plannotate just outputs a dataframe.

ChristopherRichie commented 12 months ago

Thanks.

I was able to amend the current bed file by giving the 5th column a "0" for all rows, and the converting the 6th column to one of the three accepted values. The amended file was accepted and correctly interpreted by Snapgene (which was my intended use-case at the time), I have not yet tried it with IGV or other visualization apps.

THANKs again Chris

From: Sarah Griffiths @.> Sent: Monday, July 10, 2023 10:38 AM To: epi2me-labs/wf-clone-validation @.> Cc: Richie, Christopher (NIH/NIDA) [E] @.>; Author @.> Subject: [EXTERNAL] Re: [epi2me-labs/wf-clone-validation] [Bug]: BED format problem (Issue #24)

Hi, thanks for pointing this out, You are correct the column is for strand and the -1's should be -. I'll get this updated. This bed file is just output by the workflow, plannotate just outputs a dataframe.

- Reply to this email directly, view it on GitHubhttps://github.com/epi2me-labs/wf-clone-validation/issues/24#issuecomment-1629108628, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AR4COQ5TO2ZY4JKVIT2SREDXPQHWDANCNFSM6AAAAAA2CIZBUA. You are receiving this because you authored the thread.Message ID: @.**@.>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and are confident the content is safe.

sarahjeeeze commented 11 months ago

The strand issue should now be resolved in the latest version.

sarahjeeeze commented 10 months ago

Closing through lack of response, and issue should now be resolved. Thanks again for the feedback.