Closed pontushojer closed 3 years ago
Looks really nice. Since it was a really good description I would also copy paste part of your description in the PR into the permanent documentation docs/develop.rst.
Contigs are now defined at three levels:
all
= every contig in referenceprimary
= every contig in reference that should go through certain post-processing steps (see below). Is a subset ofall
phased
= every contig in reference that is diploid i.e. can be phased. Is a subset ofprimary
.Post-processing steps run by
primary
contigs but not all:
find_clusterdups
+get_barcode_merges
concat_molecule_stats
+get_barcodes_to_filter
call_variants
@FrickTobias Thanks for the input!
I have added a new section in develop.rst
relating to the pipeline which contains the info.
Contigs are now defined at three levels:
all
= every contig in referenceprimary
= every contig in reference that should go through certain post-processing steps (see below). Is a subset ofall
phased
= every contig in reference that is diploid i.e. can be phased. Is a subset ofprimary
.Post-processing steps run by
primary
contigs but notall
:find_clusterdups
+get_barcode_merges
concat_molecule_stats
+get_barcodes_to_filter
call_variants
Reasons to add
primary
contigs include:get_barcode_merges
andget_barcodes_to_filter
are both bottlenecks in the pipeline where every chunk needs to be processed through to proceed. Thus the fewer contigs (and in extension chunks) there are the faster processing will take.