Closed WenyuLiang closed 2 years ago
For haploid assembly, you can do that using the following command line option:
--Assembly.writeReadsByAssembledSegment
I have not tested this option in some time, so if you bump into problems please post here and I will look into it.
For diploid assembly, this functionality is not available.
A bit more information on that option. If you turn it on, the assembly directory will contain a csv file named ReadsBySegment.csv
. The top of the file looks like this:
The meaning of the columns is as follows:
AssembledSegmentId
identifies an assembled segment (same identifier used in other assembly output such as Assembly.fasta
).EdgeCount
is the length of that assembled segment (number of edges) in the marker graph.OrientedReadCount
is the number of oriented reads that were used to assemble the segment. An oriented read is a read in either the original orientation, or with reverse complement.OrientedReadId
is the Shasta internal id of a read that was used to assemble the segment. It uses the format ReadId-Strand
where Strand can be 0 (original orientation) or 1 (reverse complemented). So for example 66-1
means read 66, reverse complemented. To convert the Shasta internal ReadId
to the read name in the input fasta/fastq files, you can use the first two columns of ReadSummary.csv
. VertexCount
and EdgeCount
are the number of marker graph vertices and edges, respectively, that the given oriented reads appear on, out of the vertices and edges that make up the assembled segment.Thank you so much!!!
I am closing this due to lack of additional discussion. If other questions emerge, feel free to open another issue.
Hi! If there is a way I can know which raw reads go to a specific contig?