primer BED file isn't in IVAR format

artic-network / artic-ncov2019

ARTIC nanopore protocol for nCoV2019 novel coronavirus

Creative Commons Attribution 4.0 International

168 stars 166 forks source link

primer BED file isn't in IVAR format #12

Closed tseemann closed 3 years ago

tseemann commented 4 years ago

https://andersen-lab.github.io/ivar/html/manualpage.html

They need a score in col 4 and a strand in col 5 ?

Puerto  28  52  400_1_out_L 60  +
Puerto  482 504 400_1_out_R 60  -
Puerto  359 381 400_2_out_L 60  +
Puerto  796 818 400_2_out_R 60  -
Puerto  658 680 400_3_out_L*    60  +
Puerto  1054    1076    400_3_out_R*    60  -

You have

MN908947        30      54      nCoV-2019_1_LEFT        nCoV-2019_1
MN908947        385     410     nCoV-2019_1_RIGHT       nCoV-2019_1
MN908947        320     342     nCoV-2019_2_LEFT        nCoV-2019_2
MN908947        704     726     nCoV-2019_2_RIGHT       nCoV-2019_2
MN908947        642     664     nCoV-2019_3_LEFT        nCoV-2019_1
MN908947        1004    1028    nCoV-2019_3_RIGHT       nCoV-2019_1
MN908947        943     965     nCoV-2019_4_LEFT        nCoV-2019_2
MN908947        1312    1337    nCoV-2019_4_RIGHT       nCoV-2019_2
MN908947        1242    1264    nCoV-2019_5_LEFT        nCoV-2019_1
MN908947        1623    1651    nCoV-2019_5_RIGHT       nCoV-2019_1

gkarthik commented 4 years ago

I used the 6 column BED format (assumed this was the standard?) for ivar as documented by UCSC genome browser. It seems to match the default 6 column BED generated using bedtools bamtobed. I can also modify to read the format specified here:

Region, chromStart, chromEnd, primer_name, amplicon_name

tseemann commented 4 years ago

perl -ne 
'my @x=split m/\t/; print join("\t",@x[0..3], 60, $x[3]=~m/LEFT/?"+":"-"),"\n";' 
< nCoV-2019.scheme.bed  > ARTIC-V1.bed

joshquick commented 4 years ago

I'll fix this now, the score field is never going to be used so I could hijack that for the pool number.

gkarthik commented 4 years ago

@joshquick The V3 BED file is in the old artic BED format (column 5 - amplicon_name) compared to V1 and V2 ( which are in usual 6 column BED format). Just in case this was missed :)

gkarthik commented 4 years ago

For now, awk -F $'\t' 'BEGIN{OFS=FS;}{$5=60;print}' nCoV-2019.bed > nCoV-2019_col5_replaced.bed will get the standard BED format for V3 BED file

tseemann commented 4 years ago

awk -v OFS='\t' '$5=60' nCoV-2019.bed from Matthew Croxen :maple_leaf:

staciawyman commented 4 years ago

Bump! The V3 BED file is still the old ARTIC bed format and doesn't work with iVar.