btmartin721 / ClineHelpR

Detects Outliers and plots genomic clines from BGC output, and extends the plotting functionality of INTROGRESS to Correlate genomic clines and hybrid indices with Environmental Variables
GNU General Public License v3.0
16 stars 1 forks source link

pafscaff #9

Open mimulusm opened 3 years ago

mimulusm commented 3 years ago

Hello,

This R package helps me a lot! I am trying to plot an Ideogram with the output of bgc. The species I am working on has a draft genome and I aligned this to the chromosome level before running bgc. Is it possible to plot the output with our fasta data which we mapped the reads or something, instead of pafscaff output file? Alternatively. perhaps I can create a dummy file of .scaffold.tdt in pafscaff. Can you give me details of the contents of the .scaffold.tdt?

Thank your very much in advance!

tkchafin commented 3 years ago

Hi, the function would need to be modified since it currently requires the .scaffolds.tdt file right now. Here's an example below of what that looks like.

Qry     Scaffold        Description     QryLen  QryStart        QryEnd  Strand  Ref     RefLen  RefStart        RefEnd  RefMap  Identity        Length  Coverage        N       Rank    Inv     Trans
query1.01       1  1 len=51.46 Mb 1.03% 1(+) 74,333:77,337,987; 0.90% 1(-); 8.98% other;      51464084        7953    51441722        +       1       77392008        74333   77337987        Anchored        485780  541756  531399.0        446     1       465116.0        4619909.0
query1.02       1  1 len=43.32 Mb 8.98% 1(+) 5,070:77,338,662; 1.21% 1(-); 12.20% other;      43319783        18524   43288930        +       1       77392008        5070    77338662        Placed  3613688 4005849 3890364.0       571     1       525171.0        5283668.0

It might be a while before we could work on that, so in the meantime if you think you can just spoof that format that might be the fastest way.

mimulusm commented 3 years ago

Thank you very much. It looks tricky, but I'll try.

btmartin721 commented 3 years ago

Hi,

If you have a fasta file for the reference genome and another fasta file for your query scaffolds, You can run your fasta files through minimap2 to generate a PAF output file, and then use the PAF file as input into PAFScaff. PAFScaff then generates the scaffolds.tdt file that you can use with the Ideograms.

Is that what you had in mind, or am I perhaps misunderstanding what you were wanting to do?

-Bradley

mimulusm commented 3 years ago

Thank you for your response, Bradley. Maybe I didn't make myself clear; I mapped the reads for bgc to the draft sequence of the study species with a gff file (almost at the chromosome level) and use this information instead of the reference genome of the other relative species. Perhaps I should also look around other packages to draw an ideogram.

btmartin721 commented 3 years ago

Oh ok. Now I understand.

I'm going to mark this as a requested feature that we will try to get to in the future.

In the meantime, if you don't want to use the PAFScaff files to make the ideogram, you can try to write your own script to use the RIdeogram package (https://cran.r-project.org/web/packages/RIdeogram/vignettes/RIdeogram.html), which is what we used under the hood to make the ideogram plots. In the script you'd just have to get your data into a format that ideogram accepts.

-Bradley

mimulusm commented 3 years ago

Thank you very much, Bradley. I was thinking about the rideogram. The ClineHelpR helped me a lot. Thank you again. -Makiko