Closed danjgates closed 4 years ago
Hello Dan, I suspect that what you see is a side effect of a memory-allocation optimization we have implemented in RAiSD. Can you send me the vcf file and the two versions of RAiSD you used to test this further and let you know? n.alachiotis@gmail.com
Hello Dan,
What you observe is indeed a side effect of the memory-allocation optimization we have implemented in RAiSD. This will be properly fixed in the next major RAiSD release, which I estimate to be in November. Based on your dataset size, a quick fix to overcome this is to change line 64 in RAiSD.h file from:
to
This will practically prevent the optimization from taking place given your dataset size, but will make RAiSD run considerably longer (it will take about 2 hours instead of some minutes). You need to "make clean" and then "make" again, in order for this change to take place.
Also, you can consider using the RAiSD version that parses the .gz file directly, not the unzipped one. You can do that by using the MakefileZLIB makefile like this: make -f MakefileZLIB
These are the plots generated by RAiSD:
Best regards, Nikos A.
Thank you so much for this. If it takes a few hours instead of a few minutes it's fine by me.
Cheers, -Dan
On Wed, Sep 18, 2019 at 11:09 PM alachins notifications@github.com wrote:
Hello Dan,
What you observe is indeed a side effect of the memory-allocation optimization we have implemented in RAiSD. This will be properly fixed in the next major RAiSD release, which I estimate to be in November. Based on your dataset size, a quick fix to overcome this is to change line 64 in RAiSD.h file from:
define PATTERNPOOL_SIZE 1
to
define PATTERNPOOL_SIZE 100
This will practically prevent the optimization from taking place given your dataset size, but will make RAiSD run considerably longer (it will take about 2 hours instead of some minutes). You need to "make clean" and then "make" again, in order for this change to take place.
Also, you can consider using the RAiSD version that parses the .gz file directly, not the unzipped one. You can do that by using the MakefileZLIB makefile like this: make -f MakefileZLIB
These are the plots generated by RAiSD:
[image: plot] https://user-images.githubusercontent.com/1485578/65217553-0f302b80-dabd-11e9-864d-b91df3f9f9d0.png
Best regards, Nikos A.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/alachins/raisd/issues/11?email_source=notifications&email_token=AANGR4XNYJKCLV3XEWN6PR3QKMJRNA5CNFSM4IUBWMX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7CKRSQ#issuecomment-532981962, or mute the thread https://github.com/notifications/unsubscribe-auth/AANGR4Q7LKXMY2LPKLDHI6DQKMJRNANCNFSM4IUBWMXQ .
The workaround of changing the PATTERNPOOL_SIZE I proposed is no longer required. This is now properly fixed (as of version 2.4 or later), and RAiSD runs at its initial speed without leading to inflated values along the chromosome, regardless of size.
Best regards, Nikos
Greetings, I am using Raisd on a vcf dataset of 30 deep sequenced maize individuals (~5-10 million SNPs per chromosome) but I have run into an issue where the p-value is increasing across the chromosome (see attached figure:) This pattern was explained by the Var parameter and I suspected it had something to do with a recent change. I downloaded an older version (34bd5708456a30ff8972f6bb367dfd40c7eff6df) and on the increasing p-values are not present when run on the same dataset My understanding of the change is that it would allow comparisons of chromosomes of dramatically different lengths. I have run into an issue on the old version where the p-values from a simulated 10K chromosome are many orders of magnitude different than the p-values of my 150MB chromosome so the fix sounds quite relevant to how I'm hoping to proceed. The problem, however, is that I'm not certain that the current version where the p-values increase along my chromosomes will work. Is there a simple fix to this or an argument that I'm not aware of that would fix this for me? Thanks! Dan Gates