Open 7JVST opened 8 months ago
Hi,
The CSFS samples
parameter is used in the model to compute some probabilities in the HMM and there isn't much of a benefit to setting it to more than 300. Larger values also lead to higher computational costs, so best to just set it to 300.
Re: discretization, that syntax is only available through the preparedecoding
Python tool:
https://pypi.org/project/asmc-preparedecoding/
See example here: https://github.com/PalamaraLab/ASMC_dev/blob/main/notebooks/asmc_w_decodingquant.ipynb
There currently isn't a C++ implementation available.
Hi there,
I'm using the C++ compiled version of Prepare Decoding to create decoding quantities files for fastSMC, focusing on analyzing IBD segments. I've got demo files from ASMC_data and frequency files made from my own dataset including 1600 samples and around 500,000 variants. I used disc file from the one included in package "input30-100-2000.disc".
When I tried setting 'CSFSsamples=1600' to match the sample count, I ran into a memory issue causing a core dump. However, lowering 'CSFSsamples' to 300 fixed the problem.
I'm curious about the actual meaning of 'CSFS samples' counts. Do they need to match the sample count in the frequency file or the '.haps', '.samples', and '.map' files which will be used in fastSMC analysis later (n = 1600)? Also, is there a maximum limit for 'CSFS samples' counts?
Additionally, I'd like to know how to define my own number of quantiles for discretization in the C++ version. I noticed Python version allows user to define discretization like this: discretization=[[30.0, 15], [100.0, 15], 39]. Can you tell me how to do this in the C++ version?