Open itslittman opened 1 year ago
Ok, it works if I use a smaller subset of the eventalign. So it must be a memory issue. How do I get this to run on the full file without memory usage going crazy? It gets killed after eating up around 200gb.
You need to run your experiment then on a server with a large memory.
On Sat, Nov 12, 2022 at 12:40 AM itslittman @.***> wrote:
Ok, it works if I use a smaller subset of the eventalign. So it must be a memory issue. How do I get this to run on the full file without memory usage going crazy? It gets killed after eating up around 200gb.
— Reply to this email directly, view it on GitHub https://github.com/Janga-Lab/Nm-Nano/issues/3#issuecomment-1312374923, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVLWKECVNXTJ6KIYXUMJYDWH4UUBANCNFSM6AAAAAAR5EYNCM . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Realistically, how much RAM do I need to run this assuming the sizes of my files are the same as those used in the original paper?
on UbuntuLinux server with 128 GB of RAM, 16 Intel Xeon E5-2609 1.7GHZ CPU cores, and 8 GPU cards. UbuntuLinux server with 128 GB of RAM, 16 Intel Xeon E5-2609 1.7GHZ CPU cores, and 8 GPU cards
On Sat, Nov 12, 2022 at 10:17 AM itslittman @.***> wrote:
Realistically, how much RAM do I need to run this assuming the sizes of my files are the same as those used in the original paper?
— Reply to this email directly, view it on GitHub https://github.com/Janga-Lab/Nm-Nano/issues/3#issuecomment-1312503631, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVLWKGZX4CC5RSOER4SOELWH6YG7ANCNFSM6AAAAAAR5EYNCM . You are receiving this because you commented.Message ID: @.***>
Is this the kind of thing that can be processed in chunks?
Yes.
On Sat, Nov 12, 2022 at 10:47 AM itslittman @.***> wrote:
Is this the kind of thing that can be processed in chunks?
— Reply to this email directly, view it on GitHub https://github.com/Janga-Lab/Nm-Nano/issues/3#issuecomment-1312511896, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVLWKFOIY4DK23XNUTU3DDWH63X7ANCNFSM6AAAAAAR5EYNCM . You are receiving this because you commented.Message ID: @.***>
What would be the best way to do this? I'm not great with Python. Thank you!!
Please have a look at:
https://stackoverflow.com/questions/17315737/split-a-large-pandas-dataframe?noredirect=1&lq=1
or
https://stackoverflow.com/questions/44729727/pandas-slice-large-dataframe-into-chunks
On Sun, Nov 13, 2022 at 1:25 PM itslittman @.***> wrote:
What would be the best way to do this? I'm not great with Python. Thank you!!
— Reply to this email directly, view it on GitHub https://github.com/Janga-Lab/Nm-Nano/issues/3#issuecomment-1312791854, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVLWKGFXM23ZQ2MJLPN3ELWIEXAJANCNFSM6AAAAAAR5EYNCM . You are receiving this because you commented.Message ID: @.***>
So, should I split into chunks, process each chunk, then when it gets to:
df=df2[df2['model_kmer'].isin(U_kmer_list)]
I should instead have:
df=chunk[chunk['model_kmer'].isin(U_kmer_list)] ?
and then how do I get it to loop back for the rest of the chunks ? Before, I was reading in each chunk and then immediately concatenating them after reading them, which I discovered obviously defeats the purpose of chunking in the first place.
Noah
Hi, I have all the alignments/eventalign and coordinate files successfully generated. However, when I try to run the model scripts (xgboost/RF), I don't see any output except a .pyc file in the pycache folder. It doesn't give an error though, and it eats a lot of memory (seems like it's working). It just seems to run, finish, and not produce any output files whatsoever.