Understanding CSFS Samples and Discretization in ASMC's Prepare Decoding Tool

Hi there,

I'm using the C++ compiled version of Prepare Decoding to create decoding quantities files for fastSMC, focusing on analyzing IBD segments. I've got demo files from ASMC_data and frequency files made from my own dataset including 1600 samples and around 500,000 variants. I used disc file from the one included in package "input30-100-2000.disc".

When I tried setting 'CSFSsamples=1600' to match the sample count, I ran into a memory issue causing a core dump. However, lowering 'CSFSsamples' to 300 fixed the problem.

I'm curious about the actual meaning of 'CSFS samples' counts. Do they need to match the sample count in the frequency file or the '.haps', '.samples', and '.map' files which will be used in fastSMC analysis later (n = 1600)? Also, is there a maximum limit for 'CSFS samples' counts?

Additionally, I'd like to know how to define my own number of quantiles for discretization in the C++ version. I noticed Python version allows user to define discretization like this: discretization=[[30.0, 15], [100.0, 15], 39]. Can you tell me how to do this in the C++ version?

PalamaraLab / PrepareDecoding

Understanding CSFS Samples and Discretization in ASMC's Prepare Decoding Tool #13