Open gwaybio opened 3 years ago
I've overcome this issue by generating plate-level profiles, however, a new error appeared:
Now normalizing gene...with operation: standardize for spilt ALLBATCHES___CP257A___ALLWELLS
/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/sklearn/utils/extmath.py:847: RuntimeWarning: invalid value encountered in true_divide
updated_mean = (last_sum + new_sum) / updated_sample_count
/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/sklearn/utils/extmath.py:689: RuntimeWarning: Degrees of freedom <= 0 for slice.
result = op(x, *args, **kwargs)
Now normalizing guide...with operation: standardize for spilt ALLBATCHES___CP257A___ALLWELLS
/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/sklearn/utils/extmath.py:847: RuntimeWarning: invalid value encountered in true_divide
updated_mean = (last_sum + new_sum) / updated_sample_count
/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/sklearn/utils/extmath.py:689: RuntimeWarning: Degrees of freedom <= 0 for slice.
result = op(x, *args, **kwargs)
Now normalizing single_cell...with operation: standardize for spilt ALLBATCHES___CP257A___ALLWELLS
/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/sklearn/utils/extmath.py:847: RuntimeWarning: invalid value encountered in true_divide
updated_mean = (last_sum + new_sum) / updated_sample_count
/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/sklearn/utils/extmath.py:689: RuntimeWarning: Degrees of freedom <= 0 for slice.
result = op(x, *args, **kwargs)
@gwaygenomics I ran the normalization and feature selection but focusing only on the plate level profiles after aggregation. After a successful test on my local computer I ended up running the last 2 steps of recipe (Normalization, and feature selection) for the rest of the files on AWS by removing "- single_cell" from levels in the options.yaml config file.
@gwaygenomics Am I understanding
I've overcome this issue by generating plate-level profiles
to say that you were never able to run 1./1.aggregate without splitting the data further because it takes too much memory?
If I need to make the same split while processing a different batch I change config/experiment.yaml to the following?
split:
qc:
batches: false
plates: false
wells: false
profile:
batches: false
plates: true
wells: false
And then @MerajRamezani you're saying that after aggregation by plate, 1./2.normalization and 1./3.feature-selection you set config/options.yaml to
levels:
- gene
- guide
You're saying that was necessary to avoid the error Greg mentioned above?
@gwaygenomics @MerajRamezani can you take a look at this so I can get un-stuck? Thanks!
@ErinWeisbart My understanding was that the weld process was failing when it was handling the normalization of single cell profiles. My guess is normalizing profiles from all cells in a plate might have overloaded the memory. It is useful to have single-cell profiles normalized at the plate level but it is not essential to start with. So basically I took the single-cell profiles in one csv.gz (at plate level) aggregated them at both gene/guide levels followed by Normalization at plate level and feature selection .
We've seen this error before, it has to do with the machine not being large enough to write out the full single cell file per plate.
Here is the error where halted: