OpenCSGs / csghub

CSGHub is an open-source large model platform just like on-premise version of Hugging Face. You can easily manage models and datasets, deploy model applications and setup model finetune or inference jobs with user interface. CSGHub also provides Python SDK with full compatibility of hf sdk. Join us together to build a safer and more open platform⭐️
https://opencsg.com
2.82k stars 435 forks source link

Feature Request: Support for Additional File Formats in Data Preview #458

Open pingdoom opened 1 month ago

pingdoom commented 1 month ago

Hello CSGHub Team,

I hope this message finds you well. I've been exploring the CSGHub platform and I'm impressed with its capabilities, especially in managing large model assets and datasets. It's evident that a lot of thought and effort has gone into making CSGHub a comprehensive asset management platform.

One area where I believe CSGHub could be enhanced is in the support for additional file formats in the dataset preview functionality. Currently, CSGHub provides excellent support for previewing datasets in common formats. However, as datasets become increasingly complex and diverse, the need to support additional formats becomes apparent.

Enhancement Request: I would like to request the addition of support for the following file formats in the dataset preview functionality:

These formats are widely used in the data science and machine learning communities for storing large, complex datasets. Supporting these formats would significantly enhance the usability of CSGHub for a broader audience and facilitate more efficient data exploration and management.

Justification:

Potential Implementation: While I understand that adding support for these formats might require considerable effort, perhaps starting with HDF5, given its widespread use, could be a beneficial first step. Utilizing existing open-source libraries for reading these formats could also streamline the implementation process.

I believe that extending dataset preview capabilities to include these formats would make CSGHub even more versatile and valuable to the data science and machine learning communities.

Thank you for considering this enhancement request. I'm looking forward to seeing how CSGHub continues to evolve and meet the needs of its users.

SeanHH86 commented 1 month ago

@pingdoom Thanks for raising this and give more information and justification for data view on those data format. Dataset preview is key feature for us and lots of requirements are coming, make those datasets can be preview on CSGhub are in roadmap and we are working on dataset view to deal with datasets, and there are more things need to be consider include security, performance, usability etc. Looking forward to receiving more feedback from you.

Have a nice day!