Closed rogerkuou closed 1 year ago
Kudos, SonarCloud Quality Gate passed!
0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells
No Coverage information
0.0% Duplication
@fnattino Thanks for the detailed check!
Actually, I found now if I remove the "external reading" part, the function still works as the same. Thus I made another commit.
Highly likely, the failure of dtype recognition was introduced by my previous implementation, where I tried to convert the entire dask dataframe to a dask array, then loop through all attributes. Now I roll back to the original "column wise" operation and perform the conversion per column. See this part. I found this does not influence my dask graph.
Would you like to give another look?
Fix #43 and #44
Thanks @fnattino for spotting these issues.
Indeed #43 should be caused by the ambiguous dtype. When writing to zarr, the chunk need to be loaded to determine the dtype. I made a fix by loading 10 rows and determine the
dtypes
, then passing it todaskdataframe.read_csv
. Forpnt_id
, I specified the data type asstr
. Then there will be noobject
type anymore.I did the following test. Both the slow loading/warning, and the inconsistent chunksize should have been fixed.
Could you try if it has been fixed?