Janga-Lab / Nm-Nano1

MIT License
2 stars 0 forks source link

No output for models #3

Open itslittman opened 1 year ago

itslittman commented 1 year ago

Hi, I have all the alignments/eventalign and coordinate files successfully generated. However, when I try to run the model scripts (xgboost/RF), I don't see any output except a .pyc file in the pycache folder. It doesn't give an error though, and it eats a lot of memory (seems like it's working). It just seems to run, finish, and not produce any output files whatsoever.

itslittman commented 1 year ago

Ok, it works if I use a smaller subset of the eventalign. So it must be a memory issue. How do I get this to run on the full file without memory usage going crazy? It gets killed after eating up around 200gb.

hsdoaa commented 1 year ago

You need to run your experiment then on a server with a large memory.

On Sat, Nov 12, 2022 at 12:40 AM itslittman @.***> wrote:

Ok, it works if I use a smaller subset of the eventalign. So it must be a memory issue. How do I get this to run on the full file without memory usage going crazy? It gets killed after eating up around 200gb.

— Reply to this email directly, view it on GitHub https://github.com/Janga-Lab/Nm-Nano/issues/3#issuecomment-1312374923, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVLWKECVNXTJ6KIYXUMJYDWH4UUBANCNFSM6AAAAAAR5EYNCM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

itslittman commented 1 year ago

Realistically, how much RAM do I need to run this assuming the sizes of my files are the same as those used in the original paper?

hsdoaa commented 1 year ago

on UbuntuLinux server with 128 GB of RAM, 16 Intel Xeon E5-2609 1.7GHZ CPU cores, and 8 GPU cards. UbuntuLinux server with 128 GB of RAM, 16 Intel Xeon E5-2609 1.7GHZ CPU cores, and 8 GPU cards

On Sat, Nov 12, 2022 at 10:17 AM itslittman @.***> wrote:

Realistically, how much RAM do I need to run this assuming the sizes of my files are the same as those used in the original paper?

— Reply to this email directly, view it on GitHub https://github.com/Janga-Lab/Nm-Nano/issues/3#issuecomment-1312503631, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVLWKGZX4CC5RSOER4SOELWH6YG7ANCNFSM6AAAAAAR5EYNCM . You are receiving this because you commented.Message ID: @.***>

itslittman commented 1 year ago

Is this the kind of thing that can be processed in chunks?

hsdoaa commented 1 year ago

Yes.

On Sat, Nov 12, 2022 at 10:47 AM itslittman @.***> wrote:

Is this the kind of thing that can be processed in chunks?

— Reply to this email directly, view it on GitHub https://github.com/Janga-Lab/Nm-Nano/issues/3#issuecomment-1312511896, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVLWKFOIY4DK23XNUTU3DDWH63X7ANCNFSM6AAAAAAR5EYNCM . You are receiving this because you commented.Message ID: @.***>

itslittman commented 1 year ago

What would be the best way to do this? I'm not great with Python. Thank you!!

hsdoaa commented 1 year ago

Please have a look at:

https://stackoverflow.com/questions/17315737/split-a-large-pandas-dataframe?noredirect=1&lq=1

or

https://stackoverflow.com/questions/44729727/pandas-slice-large-dataframe-into-chunks

On Sun, Nov 13, 2022 at 1:25 PM itslittman @.***> wrote:

What would be the best way to do this? I'm not great with Python. Thank you!!

— Reply to this email directly, view it on GitHub https://github.com/Janga-Lab/Nm-Nano/issues/3#issuecomment-1312791854, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVLWKGFXM23ZQ2MJLPN3ELWIEXAJANCNFSM6AAAAAAR5EYNCM . You are receiving this because you commented.Message ID: @.***>

itslittman commented 1 year ago

So, should I split into chunks, process each chunk, then when it gets to:

df=df2[df2['model_kmer'].isin(U_kmer_list)]

I should instead have:

df=chunk[chunk['model_kmer'].isin(U_kmer_list)] ?

and then how do I get it to loop back for the rest of the chunks ? Before, I was reading in each chunk and then immediately concatenating them after reading them, which I discovered obviously defeats the purpose of chunking in the first place.

Noah