Question on 10x Genomics data

bcgsc / physlr

:chains: Construct a Physical Map from Linked Reads

GNU General Public License v3.0

18 stars 8 forks source link

Question on 10x Genomics data #193

Closed YingChen94 closed 4 months ago

YingChen94 commented 4 months ago

Hello,

Thank you for making this tool! I have been struggling with supernova assembly from 10x data due to big genome size (4Gb). Does physlr have any requirements in terms of computer resources for such big genome size? For building physical maps, is the input file the raw sequencing data? Or do I need to do any pre-processing? I am thinking to use a different species in the same genus as the reference genome. Is it appropriate?

Thank you! Ying

aafshinfard commented 4 months ago

Hello @YingChen94 and thank you for opening an issue, The input from 10x needs to be preprocessed with LongRanger and the barcodes should be in the read header in the tag format BX:Z common for 10x reads. Our experiments on Humans showed ram requirements of up to 200 GB but it does not necessarily scale up with the size of the genome and depends on the coverage too. I previously ran it successfully on a plant genome of roughly that size but the physical maps were not very contiguous - no recorded history of ram usage but surely below 1 TB as that's max on the machine I used. Physlr is a de novo tool and does not require a reference genome but can accept one for evaluation purposes. If you trust the reference genome is closely related you may use that to get some sense on the quality of the results.

YingChen94 commented 4 months ago

Thank you for fast reply! Do I need to filter those sequences without the "BX:Z" barcode (I think those reads have invalid barcodes) before running Physlr?

aafshinfard commented 4 months ago

Yes @YingChen94 , I believe you need to filter them before running Physlr. Does Longranger not filter them by default or by setting a flag?

aafshinfard commented 4 months ago

@YingChen94 if you are using the most recent version of Physlr (which uses btllib for indexlr) then it should be able to handle/filter reads with no barcodes as well.

YingChen94 commented 4 months ago

Hi Amirhossein, the output from Longranger includes reads without "BX:Z" barcode flag. I am using Physlr v1.0.4 and it seems to have ran the first step histogram successfully.