AllenCellModeling / datasetdatabase

Modeling DB Schema, Creation, and IO
Other
1 stars 0 forks source link

Error during get_dataset -- Cannot convert float NaN to integer #31

Closed vianamp closed 6 years ago

vianamp commented 6 years ago

I am trying to get a recently uploaded dataset called _QCB_FOVfeature with

import datasetdatabase as dsdb
prod = dsdb.DatasetDatabase(config='/allen/aics/assay-dev/Analysis/QCB_database/prod_config.json')

prod.get_dataset(name='QCB_FOV_feature')

and I am getting this error:

Reconstructing dataset...
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-20-c77c49360c15> in <module>()
      2 prod = dsdb.DatasetDatabase(config='/allen/aics/assay-dev/Analysis/QCB_database/prod_config.json')
      3 
----> 4 prod.get_dataset(name='QCB_FOV_feature')

~/anaconda3/envs/qcb/lib/python3.6/site-packages/datasetdatabase/core.py in get_dataset(self, id, name)
   1090         # reconstruct object
   1091         obj = RECONSTRUCTOR_MAP[ds_info.introspector](
-> 1092             db=self.db, ds_info=ds_info)
   1093 
   1094         return Dataset(dataset=obj, ds_info=ds_info)

~/anaconda3/envs/qcb/lib/python3.6/site-packages/datasetdatabase/introspect/dataframe.py in reconstruct(db, ds_info)
    381     with Pool(n_threads) as pool:
    382         # map pool
--> 383         rows = pool.map(func, group_datasets)
    384 
    385     # sort

~/anaconda3/envs/qcb/lib/python3.6/multiprocessing/pool.py in map(self, func, iterable, chunksize)
    264         in a list that is returned.
    265         '''
--> 266         return self._map_async(func, iterable, mapstar, chunksize).get()
    267 
    268     def starmap(self, func, iterable, chunksize=None):

~/anaconda3/envs/qcb/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
    642             return self._value
    643         else:
--> 644             raise self._value
    645 
    646     def _set(self, i, obj):

~/anaconda3/envs/qcb/lib/python3.6/multiprocessing/pool.py in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
    117         job, i, func, args, kwds = task
    118         try:
--> 119             result = (True, func(*args, **kwds))
    120         except Exception as e:
    121             if wrap_exception and func is not _helper_reraises_exception:

~/anaconda3/envs/qcb/lib/python3.6/multiprocessing/pool.py in mapstar(args)
     42 
     43 def mapstar(args):
---> 44     return list(map(*args))
     45 
     46 def starmapstar(args):

~/anaconda3/envs/qcb/lib/python3.6/site-packages/datasetdatabase/introspect/dataframe.py in _reconstruct_group(group_dataset, database, progress_bar)
    338 
    339     # get label
--> 340     label = int(float(group_dataset["Label"]))
    341 
    342     # create group

ValueError: cannot convert float NaN to integer
evamaxfield commented 6 years ago

@vianamp Did you upload this dataset?

vianamp commented 6 years ago

Yes, well, at least I think so. I did: ds.upload_to(prod)

evamaxfield commented 6 years ago

Did you do any dataset prep before upload? Reindex, sort, etc? I ask because this is the exact same issue we had previously when a dataset was sorted but not reindexed.

vianamp commented 6 years ago

Yes. Feature datasets have been re-indexed by cell_id.

evamaxfield commented 6 years ago

After you re-indexed did you dataframe = dataframe.reset_index(drop=True)?

I think I sent a channel message that said I was going to add that in as automated but I never did so I think that might be the issue. If you did not do the above I will patch and push and also delete the dataset (but not the Iota) so you can reingest.

vianamp commented 6 years ago

I did not. Sorry. Please let me know when it is ready for reingest.

evamaxfield commented 6 years ago

No worries totally my fault, I forgot to add that one line lol.

evamaxfield commented 6 years ago

Do any other datasets need to be deleted as well?

vianamp commented 6 years ago

Thanks

evamaxfield commented 6 years ago

Those datasets have been purged and the reset_index patch (v1.0.27) has been pushed (You should be able to just push your prepped dataframe into a dataset without having to worry about index like you did) please try ingest and pull again please. I will close this issue when you let me know the get_dataset worked properly.

evamaxfield commented 6 years ago

Tested after patch, get_dataset runs to completion. Marking as closed.