SciCatProject / scicat-filewriter-ingest

Python client that connects to a kafka queue and creates new datasets when receiving notification that a file has been written
0 stars 0 forks source link

What is `size` in the dataset descriptions? #54

Closed YooSunYoung closed 3 months ago

YooSunYoung commented 4 months ago

There were 3 different ways of calculating file-size in the code. Can we choose one of them for all the size field? I'll go with stats using pathlib.Path object for now.

Length

https://github.com/SciCatProject/scicat-filewriter-ingest/blob/4ac33271f03271a32b7e3fc78fe0f39a052a5b2f/ingestor_lib.py#L326-L329

https://github.com/SciCatProject/scicat-filewriter-ingest/blob/4ac33271f03271a32b7e3fc78fe0f39a052a5b2f/ingestor_lib.py#L273-L276

From a Property

https://github.com/SciCatProject/scicat-filewriter-ingest/blob/4ac33271f03271a32b7e3fc78fe0f39a052a5b2f/ingestor_lib.py#L424-L435

Stats

https://github.com/SciCatProject/scicat-filewriter-ingest/blob/4ac33271f03271a32b7e3fc78fe0f39a052a5b2f/ingestor_lib.py#L707-L716

nitrosx commented 4 months ago

The size in the datafilelist element is the size reported by the storage system. The size in origdatablock is the total size of all the files listed in datafilelist The size in dataset is the total size of the files listed in all the datafilelist associated with the dataset

nitrosx commented 3 months ago

@YooSunYoung Can we close this issue?

YooSunYoung commented 3 months ago

Fixed by #61