Closed YingChen94 closed 4 months ago
Hello @YingChen94 and thank you for opening an issue, The input from 10x needs to be preprocessed with LongRanger and the barcodes should be in the read header in the tag format BX:Z common for 10x reads. Our experiments on Humans showed ram requirements of up to 200 GB but it does not necessarily scale up with the size of the genome and depends on the coverage too. I previously ran it successfully on a plant genome of roughly that size but the physical maps were not very contiguous - no recorded history of ram usage but surely below 1 TB as that's max on the machine I used. Physlr is a de novo tool and does not require a reference genome but can accept one for evaluation purposes. If you trust the reference genome is closely related you may use that to get some sense on the quality of the results.
Thank you for fast reply! Do I need to filter those sequences without the "BX:Z" barcode (I think those reads have invalid barcodes) before running Physlr?
Yes @YingChen94 , I believe you need to filter them before running Physlr. Does Longranger not filter them by default or by setting a flag?
@YingChen94 if you are using the most recent version of Physlr (which uses btllib for indexlr) then it should be able to handle/filter reads with no barcodes as well.
Hi Amirhossein, the output from Longranger includes reads without "BX:Z" barcode flag. I am using Physlr v1.0.4 and it seems to have ran the first step histogram successfully.
Hello,
Thank you for making this tool! I have been struggling with supernova assembly from 10x data due to big genome size (4Gb). Does physlr have any requirements in terms of computer resources for such big genome size? For building physical maps, is the input file the raw sequencing data? Or do I need to do any pre-processing? I am thinking to use a different species in the same genus as the reference genome. Is it appropriate?
Thank you! Ying