Closed alpapan closed 1 month ago
if you are trying to load raw reads as a track, i would definitely recommend converting to SAM and then further on to BAM or CRAM, and then loading as a plain old "AlignmentsTrack" (auto-inferred by e.g. jbrowse add-track yourfile.bam)
the PAF adapter was created as a method to visualize "assembly-to-assembly" alignments, so e.g.
minimap2 genome1.fa genome2.fa
rather than minimap2 genome.fa reads.fq
The PAF adapter is inferred to be used for a JBrowse "SyntenyTrack", where both genome1.fa and genome2.fa are loaded as assemblies in your config.json (and you supply multiple 'assembly names' to the add-track command for it to indicate that the 'track' belongs to both assemblies: e.g. jbrowse add-track yourfile.paf -a genome2,genome1
, note the order is flipped as compared to the order in the minimap2 command).
so basically, you'd want to create a de-novo assembly to use the PAFAdapter or the PairwiseIndexedPAFAdapter. the PAF adapter is good for smallish genome-to-genome alignments, but it loads the entire PAF into memory. the PairwiseIndexedPAFAdapter has the ability to load only the relevant portions of what you are viewing, particularly when you load a "synteny track" in the linear genome view (example session link, the lower panel will load much faster because it uses the PairwiseIndexedPAFAdapter e.g. pif.gz while the top tries to load the human vs mouse comparison PAF entirely into memory which is like 70mb gzipped or 200+Mb in memory https://jbrowse.org/code/jb2/v2.15.1/?config=test_data%2Fhs1_vs_mm39%2Fconfig.json&session=share-yzS0ST28zx&password=M8QHY)
note that the these whole genome alignments are challenging and we're still trying to work on scalability of the whole genome alignments for larger genomes, so definitely interested if you are running into those limits.
here is a guide showing minimap2 FASTQ->CRAM workflow: https://www.htslib.org/workflow/fastq.html
it is sort of focused on paired end reads but same idea can be applied to long reads
ok, thank you!
p.s. the whole genome alignments/synteny views for jb2 works really well for me (genomes 300-400 mb), love it and use it a lot
I've been using minimap to map my pacbio long reads to my genome. Output is about 6 gb uncompressed
Uncompressed it gives a file size is greater than 2gb error.
I could convert it to another file format or we could use the existing pafadapter (pif is twice the size and i'm not sure if it would index the reads or the assembly, regardless we don't need the reads).
But the PAFAdapter does not use indexes and I haven't managed to make it load on my desktop.
this is how I create a sorted index paf
but I haven't been able to visualise it: i get empty tracks (as an alignment, feature, or synteny track) either as the alignment against the genome or just one scaffold.
I'm just going to use
paftools.js splice2bed
and convert it to BED (and maybe just use the SAM output of minimap2) but just wondering if the above makes sense?