broadinstitute / CP186-A549-WG

BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Error in 1.generate-profiles/1.aggregate #8

Closed ErinWeisbart closed 2 years ago

ErinWeisbart commented 2 years ago

As noted in #6, using largest reasonable machine (976 GB memory) step made it through appending all sites, and crashed making the single cell dataframe. Crash was silent which suggests it's an out-of-memory error...

ErinWeisbart commented 2 years ago

I was asking questions about CP257 to see if it could help me get unstuck here with CP186. I'm still confused about what to do here with CP186 so I'm moving discussion here.

With CP186 I've currently made it through 1.generate-profiles/0.merge-single-cells. In config/options.yaml I had to set profile:output_one_single_cell_file_only:false to avoid memory errors so I now have data/1.profiles/BATCH/single_cell/SITE/{site}_single_cell.csv.gz generated for each site in CP186.

I now cannot get through 1./1.aggregate without running out of memory. Greg said (about running into the same memory error during aggregation of CP257) "I've overcome this issue by generating plate-level profiles".

@MerajRamezani Can you tell me exactly what to set in the config files to "generate plate-level profiles"? Does that mean starting at 1./1.aggregate using

  split:
    profile:
      batches: false
      plates: true
      wells: false
MerajRamezani commented 2 years ago

@ErinWeisbart I believe for cp257 we did it like this: split: profile: batches: true plates: true wells: false

Additionally in the options.yml for normalize and feature_select we had this:

levels:

(No single_cell to be clear!)

Hope this is helpful!

ErinWeisbart commented 2 years ago

Thanks Meraj! Testing these parameters now except using batches: false because CP257 had DMEM and HPLM arms while CP186 is all a single arm.

ErinWeisbart commented 2 years ago

Worked! Memory usage details in #6