3DGenomes / binless

Resolution-independent normalization of Hi-C data
GNU Lesser General Public License v3.0
7 stars 2 forks source link

"std::bad_alloc" when running optimized_binless.R #19

Closed maowenwen closed 4 years ago

maowenwen commented 4 years ago

Hello professor, The server I used had 512 Gb of memory. I took two 10M ChIA-PET data as input, not Hi-C data. When I ran preprocessing.R, everything went well. But when I ran the optimized_binless.R, it always failed during the first iteration to calculate the genomic biases. As shown below.

Normalization with fast approximation and performance iteration No initial guess provided Preparing for signal estimation Counting zeros Initial guess: residuals Initial guess: exposures Initial guess: decay Initial guess: biases

Iteration 1

Residuals Exposures log-likelihood = 19.50625 Dispersion fit: alpha 14121.34 log-likelihood = -140099.9 Genomic group 1 : Error in generate_spline_base(cutsites, min(cutsites), max(cutsites), : std::bad_alloc Timing stopped at: 49.94 47.93 42.57

I looked up all the function definitions, but I couldn't find the function "generate_spline_base". Is there something wrong with this function? Or with my data?

yannickspill commented 4 years ago

Yes this is a classical confusion. You should use fast_binless.R or chromosome_binless.R for regions larger than 1 to 2M. Optimized binless (see optimized_binless.R tutorial) is able to normalize regions with much more than 100 restriction sites, but becomes unstable on regions larger than 1 to 2M. After that, you must use fast binless (see fast_binless.R), or a combination of optimized+fast (chromosome_binless.R)

maowenwen commented 4 years ago

I ran fast_binless.R as you said. I waited one hours but nothing came out, and then the server stopped working. I wonder if the 10M data is still too big. Same as before, I looked up all the function definitions, but I couldn't find the function "fast_binless". Where can I find its function definition?

yannickspill commented 4 years ago

fast_binless is an Rcpp wrapper for the C++ function fast::binless, defined in binless/src/fast_binless.hpp. Note that you must pass binned data, try binning it at 5kb. The tutorial files explain how to do this, see preprocessing.R

maowenwen commented 4 years ago

hi profession, I'm going to calculate the true positive rate. And I looked at 'Benchmark: interaction detection.' in Methods. Also found data in Supplementary Data panel 9. But I'm still not sure how TN and TP are calculated. Now I've got the normalized matrix using my own data. How can I get the true positive rate for my own data?

yannickspill commented 4 years ago

I'm sorry but this is not the place to discuss true/false positives. This thread is about std:bad_alloc, and the github issues tracker is only for issues related to the execution of binless, not data analysis in general. I believe the paper describes what should be done to compute these quantities. If things are unclear, you can write to the e-mail in the paper, but please be more specific than that.