AG-Boerries / CAST-Seq

CAST-Seq Bioinformatic pipeline
GNU Affero General Public License v3.0
5 stars 1 forks source link

additional files in G3_TOY/data #2

Open kelsi-kw opened 2 years ago

kelsi-kw commented 2 years ago

Hello! I've been able to get Cast-seq to run on the G3_TOY data, but I was curious what the other files are that are called on from the G3_TOY directory. I can't find a description in the paper or in the repo. I am trying to figure out from my data, what these files should be. headTOhead.fa linker.fa linker_RC.fa mispriming.fa neg.fa pos.fa

Thank you for your help!

peggy314pch commented 1 year ago

Did you eventually figure out what those input files are? @kelsi-kw I tried to blast the sequence and some of these blast to both CCR2 and CCR5, but I'm not sure how the authors pick those sequences. My guess is that the mispriming file is prob the sequence similar to the guide but I wonder how do they find it. Also, do you know what is the ots.bed file and how was it generated? Maybe we should reach out to the authors. Thank you so much!

kelsi-kw commented 1 year ago

I did end up reaching out to them! Thanks for the post to remind me to answer, @peggy314pch. This is the response I got from them on the description of their files: "Thank you for pointing this out. A proper definition is indeed needed. This should be fixed with the next update. In the meantime, here is the description of these files: pos.fa: positive filter (designer nuclease target site): select reads containing this sequence before the cut site (2 mismatches allowed, min length=25). mispriming.fa (optional): Discard reads containing this sequence (to eliminate PCR mispriming products). Unless specific case, this is usually set as XXXX to keep all reads. linker_RC.fa: Reverse complement of linker used for ligation. This sequence will be trimmed (like the adapters). The other files are either deprecated or not yet implemented in the pipeline. I suggest you to do the following: neg.fa: use XXXXX headTOhead.fa: use XXXXX linker.fa: this file is not needed anymore."

peggy314pch commented 1 year ago

Thank you so much for sharing the info, @kelsi-kw !! This is very useful :D

peggy314pch commented 1 year ago

Also, @kelsi-kw do you know what is the ots.bed file? From the name it looks like "off-target site", but I'm not sure how to generate it.

kelsi-kw commented 1 year ago

The only thing I found for that is "--onTarget name of ON-target bed file (default "ots.bed")".

peggy314pch commented 1 year ago

Interesting, I look at the codes on their G3_TOY/data page (https://github.com/AG-Boerries/CAST-Seq/blob/master/samples/G3_TOY/data/ots.bed) and ots.bed is "chr3 46372985 46373015 G3_OTS 1000 +". I wonder how they generate it, especially the column saying "G3_OTS 1000".

kelsi-kw commented 1 year ago

The region and naming is following BED file format. https://en.wikipedia.org/wiki/BED_(file_format)

On Fri, May 19, 2023 at 11:41 AM peggy314pch @.***> wrote:

Interesting, I look at the codes on their G3_TOY/data page ( https://github.com/AG-Boerries/CAST-Seq/blob/master/samples/G3_TOY/data/ots.bed) and ots.bed is "chr3 46372985 46373015 G3_OTS 1000 +". I wonder how they generate it, especially the column saying "G3_OTS 1000".

— Reply to this email directly, view it on GitHub https://github.com/AG-Boerries/CAST-Seq/issues/2#issuecomment-1554770433, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASB6DJHXA5Q4AZBRALFQXMLXG6IENANCNFSM5HL4ZBNA . You are receiving this because you were mentioned.Message ID: @.***>

-- [image: facebook] https://www.facebook.com/CatalyticDataScience/ [image: twitter] https://twitter.com/CatalyticData [image: linkedin] https://www.linkedin.com/company/catalytic-data-science/ Kelsi West, MPH, PhD

Bioinformatics Scientist II

Catalytic Data Science @.*** www.catalyticds.com 2 Hollyhock Lane, Wilton, CT 06897

panxiaoguang commented 8 months ago

Interesting, I look at the codes on their G3_TOY/data page (https://github.com/AG-Boerries/CAST-Seq/blob/master/samples/G3_TOY/data/ots.bed) and ots.bed is "chr3 46372985 46373015 G3_OTS 1000 +". I wonder how they generate it, especially the column saying "G3_OTS 1000".有趣的是,我查看了他们的 G3_TOY/data 页面上的代码(https://github.com/AG-Boerries/CAST-Seq/blob/master/samples/G3_TOY/data/ots.bed),ots.bed 是“ chr3 46372985 46373015 G3_OTS 1000 +”。我想知道他们是如何生成它的,尤其是“G3_OTS 1000”这一列。

Hi, have you figured out the meaning of "1000", I'm a new user of this pipeline and I'm also want to konw what the 1000 means?

A-Chalk commented 3 months ago

As far as I can tell:

I recommend using Snapgene for this, very helpful.