alexlancaster / pypop

PyPop: Python for Population Genomics
http://pypop.org
GNU General Public License v2.0
22 stars 15 forks source link

pypop crash with about 5000 samples #52

Closed afadda91 closed 1 year ago

afadda91 commented 6 years ago

Hi, i get this error when the number of samples exceeds 1000: Error in '/gpfs/software/tools/python2.7/bin/python': double free or corruption (!prev): 0x00000000030202d0 etc.

what's your recommendation?

thanks

alexlancaster commented 6 years ago

which version are you using? you'll need to provide the input .pop and `.ini' file, otherwise it's impossible to tell what's happening. there are hard-limits to versions <= 0.7 when trying to estimate haplotypes.

afadda91 commented 6 years ago

Hi Alex,

Sorry for the long silence. There were missing python libraries on the HPC and it just got sorted out.

I was able to run 1000 samples with pypop but not 2000. I can’t share the .pop file with you but I’m sharing an example. Pop generated by the same script. I’m also sharing the error, out, and .ini. It’s confidential data and no download is allowed, hence I got a pdf file out of screen shots.

Thanks,

Abeer [cid:B662DFFE-2216-4100-9545-86F82870CF4B]

From: Alex Lancaster notifications@github.com<mailto:notifications@github.com> Reply-To: alexlancaster/pypop reply@reply.github.com<mailto:reply@reply.github.com> Date: Tuesday, June 5, 2018 at 6:25 AM To: alexlancaster/pypop pypop@noreply.github.com<mailto:pypop@noreply.github.com> Cc: Abeer Fadda afadda@sidra.org<mailto:afadda@sidra.org>, Author author@noreply.github.com<mailto:author@noreply.github.com> Subject: Re: [alexlancaster/pypop] pypop fails with more than 1000 samples (#52)

which version are you using? you'll need to provide the input .pop and `.ini' file, otherwise it's impossible to tell what's happening. there are hard-limits to versions <= 0.7 when trying to estimate haplotypes.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/alexlancaster/pypop/issues/52#issuecomment-394569436, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AgINfcLpKWkyxCe8BMNB66rC6PuMPOrmks5t5fpGgaJpZM4UPvZu.

Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center.

afadda91 commented 6 years ago

Dear Alex,

It runs properly until 1800 samples.

Abeer

From: Abeer Fadda afadda@sidra.org<mailto:afadda@sidra.org> Date: Wednesday, July 18, 2018 at 3:34 PM To: alexlancaster/pypop reply@reply.github.com<mailto:reply@reply.github.com> Subject: Re: [alexlancaster/pypop] pypop fails with more than 1000 samples (#52)

Hi Alex,

Sorry for the long silence. There were missing python libraries on the HPC and it just got sorted out.

I was able to run 1000 samples with pypop but not 2000. I can’t share the .pop file with you but I’m sharing an example. Pop generated by the same script. I’m also sharing the error, out, and .ini. It’s confidential data and no download is allowed, hence I got a pdf file out of screen shots.

Thanks,

Abeer [cid:B662DFFE-2216-4100-9545-86F82870CF4B]

From: Alex Lancaster notifications@github.com<mailto:notifications@github.com> Reply-To: alexlancaster/pypop reply@reply.github.com<mailto:reply@reply.github.com> Date: Tuesday, June 5, 2018 at 6:25 AM To: alexlancaster/pypop pypop@noreply.github.com<mailto:pypop@noreply.github.com> Cc: Abeer Fadda afadda@sidra.org<mailto:afadda@sidra.org>, Author author@noreply.github.com<mailto:author@noreply.github.com> Subject: Re: [alexlancaster/pypop] pypop fails with more than 1000 samples (#52)

which version are you using? you'll need to provide the input .pop and `.ini' file, otherwise it's impossible to tell what's happening. there are hard-limits to versions <= 0.7 when trying to estimate haplotypes.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/alexlancaster/pypop/issues/52#issuecomment-394569436, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AgINfcLpKWkyxCe8BMNB66rC6PuMPOrmks5t5fpGgaJpZM4UPvZu.

Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center.

patjeanne commented 2 years ago

Hello!

I've been trying to run more than 5.000 samples. I had read on PyPop manual that I could define emhaplofreq.h and recompile I used numbers suggested I found in this forum. I've done the build and installation steps, but when I try to run with samples, it shows me some errors:

LOG: Data file has no header data block
WARNING: The [Emhaplofreq] module is officially DEPRECATED and may be removed in coming releases.
Please transition to using the new [Haplostats] module.
LOG: estimating all pairwise LD: with 1000 permutations and 5 initial conditions for each permutation
wildcard '*' given for lociToEstHaplo, assume entire data set
LOG: estimating haplotype frequencies for all two locus haplotypes, specific haplotypes: [A:C:B:DRA:DRB1:DQA1:DQB1:DPA1:DPB1]
Traceback (most recent call last):
  File "./bin/pypop.py", line 336, in <module>
    testMode=testMode)
  File "/home/patricia/alexlancaster-pypop-0a99ba4/bin/../PyPop/Main.py", line 440, in __init__
    self._genTextOutput()
  File "/home/patricia/alexlancaster-pypop-0a99ba4/bin/../PyPop/Main.py", line 1242, in _genTextOutput
    import libxml2
ImportError: No module named libxml2

I hope you could help me, please.

alexlancaster commented 2 years ago

Hello!

I've been trying to run more than 5.000 samples. I had read on PyPop manual that I could define emhaplofreq.h and recompile I used numbers suggested I found in this forum. I've done the build and installation steps, but when I try to run with samples, it shows me some errors:

LOG: Data file has no header data block
WARNING: The [Emhaplofreq] module is officially DEPRECATED and may be removed in coming releases.
Please transition to using the new [Haplostats] module.
LOG: estimating all pairwise LD: with 1000 permutations and 5 initial conditions for each permutation
wildcard '*' given for lociToEstHaplo, assume entire data set
LOG: estimating haplotype frequencies for all two locus haplotypes, specific haplotypes: [A:C:B:DRA:DRB1:DQA1:DQB1:DPA1:DPB1]
Traceback (most recent call last):
  File "./bin/pypop.py", line 336, in <module>
    testMode=testMode)
  File "/home/patricia/alexlancaster-pypop-0a99ba4/bin/../PyPop/Main.py", line 440, in __init__
    self._genTextOutput()
  File "/home/patricia/alexlancaster-pypop-0a99ba4/bin/../PyPop/Main.py", line 1242, in _genTextOutput
    import libxml2
ImportError: No module named libxml2

I hope you could help me, please.

What platform are you compiling on? It looks like you might be missing the XML Python libraries. If you're on a Linux distribution you may be able to install them using your package manager.

As for the maximum size, for haplotype and LD estimation, theoretically there is no limit, but practically you'll run into memory limits when you get above 5-10k. @rsingle may have some more insights as to the maximum number of samples PyPop has been applied to in practice.

patjeanne commented 2 years ago

Thank you for the answer. I've solved the XML python libraries (I am trying to run at Linux and at Windows). I've done the build, but now, when I try to run using the command:

python pypop.py -c sample.ini HD.pop or >> pyhton pypop.py -i I take that error below:------------------------------------------------------------------------------------------------------------- (phd27) @.:~/alexlancaster-pypop-0a99ba4/bin$ python pypop.py -i -lPyPop: Python for Population Genomics (0.8)Copyright (C) 2003-2005 Regents of the University of CaliforniaThis is free software.  There is NO warranty; not even forMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. You may redistribute copies of PyPop under the terms of theGNU General Public License.  For more information about thesematters, see the file named COPYING. To accept the default in brackets for each filename, simply pressreturn for each prompt. Please enter config filename [config.ini]: sample.iniPlease enter population filename [no default]: HD.popPyPop is processing HD.pop ...LOG: Data file has no header data blockTraceback (most recent call last):  File "pypop.py", line 336, in     testMode=testMode)  File "/home/patricia/alexlancaster-pypop-0a99ba4/bin/../PyPop/Main.py", line 303, in init    debug=self.debug)  File "/home/patricia/alexlancaster-pypop-0a99ba4/bin/../PyPop/ParseFile.py", line 396, in init    self._genDataStructures()  File "/home/patricia/alexlancaster-pypop-0a99ba4/bin/../PyPop/ParseFile.py", line 494, in _genDataStructures    col1, col2 = self.alleleMap[locus]TypeError: 'int' object is not iterable----------------------------------------------------------------------------------------------------------------------------------------- Please, help me. Thank you, Patricia Jeanne Em terça-feira, 4 de outubro de 2022 00:23:40 GMT-3, Alex Lancaster @.> escreveu:

Hello!

I've been trying to run more than 5.000 samples. I had read on PyPop manual that I could define emhaplofreq.h and recompile I used numbers suggested I found in this forum. I've done the build and installation steps, but when I try to run with samples, it shows me some errors: LOG: Data file has no header data block WARNING: The [Emhaplofreq] module is officially DEPRECATED and may be removed in coming releases. Please transition to using the new [Haplostats] module. LOG: estimating all pairwise LD: with 1000 permutations and 5 initial conditions for each permutation wildcard '*' given for lociToEstHaplo, assume entire data set LOG: estimating haplotype frequencies for all two locus haplotypes, specific haplotypes: [A:C:B:DRA:DRB1:DQA1:DQB1:DPA1:DPB1] Traceback (most recent call last): File "./bin/pypop.py", line 336, in testMode=testMode) File "/home/patricia/alexlancaster-pypop-0a99ba4/bin/../PyPop/Main.py", line 440, in init self._genTextOutput() File "/home/patricia/alexlancaster-pypop-0a99ba4/bin/../PyPop/Main.py", line 1242, in _genTextOutput import libxml2 ImportError: No module named libxml2

I hope you could help me, please.

What platform are you compiling on? It looks like you might be missing the XML Python libraries. If you're on a Linux distribution you may be able to install them using your package manager.

As for the maximum size, for haplotype and LD estimation, theoretically there is no limit, but practically you'll run into memory limits when you get above 5-10k. @rsingle may have some more insights as to the maximum number of samples PyPop has been applied to.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

alexlancaster commented 1 year ago

Thank you for the answer. I've solved the XML python libraries (I am trying to run at Linux and at Windows). I've done the build, but now, when I try to run using the command:

python pypop.py -c sample.ini HD.pop or >> pyhton pypop.py -i I take that error below:------------------------------------------------------------------------------------------------------------- (phd27) @.***:~/alexlancaster-pypop-0a99ba4/bin$ python pypop.py -i -lPyPop: Python for Population Genomics (0.8)Copyright (C) 2003-2005 Regents of the University of CaliforniaThis is free software. There is NO warranty; not even forMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. You may redistribute copies of PyPop under the terms of theGNU General Public License. For more information about thesematters, see the file named COPYING. To accept the default in brackets for each filename, simply pressreturn for each prompt. Please enter config filename [config.ini]: sample.iniPlease enter population filename [no default]: HD.popPyPop is processing HD.pop ...LOG: Data file has no header data blockTraceback (most recent call last): File "pypop.py", line 336, in testMode=testMode) File "/home/patricia/alexlancaster-pypop-0a99ba4/bin/../PyPop/Main.py", line 303, in init debug=self.debug) File "/home/patricia/alexlancaster-pypop-0a99ba4/bin/../PyPop/ParseFile.py", line 396, in init self._genDataStructures() File "/home/patricia/alexlancaster-pypop-0a99ba4/bin/../PyPop/ParseFile.py", line 494, in _genDataStructures col1, col2 = self.alleleMap[locus]TypeError: 'int' object is not iterable

Hi there @patjeanne . Is this still an issue? You have a couple of options to make progress on this issue:

  1. test new binaries here: http://pypop.org/docs/guide-chapter-install.html (although they will still have the same hard-coded limits in emhaplofreq as before).
  2. Or you can try the developer instructions here: http://pypop.org/docs/guide-chapter-contributing.html#installation-for-developers (and make your changes to the .h).

Either way, please test against the current version, and if you are still having issues, please include the .ini and .pop file so we can reproduce. (If you can't share the file publicly let me know). If I don't hear back soon, I'll close this issue as obsolete, but you can always re-open a new issue if need be.

alexlancaster commented 1 year ago

Closing this issue as obsolete, but you can always re-open a new issue if you can reproduce errors with the new version. remember, large number of samples (>~ 5000) aren't supported for [Emhaplofreq].