CONP-PCNO / conp-dataset

:open_file_folder: A DataLad dataset for CONP
http://conp.ca
MIT License
19 stars 33 forks source link

Size or other cut-off points for building git-annex links #674

Open emmetaobrien opened 2 years ago

emmetaobrien commented 2 years ago

Right now, our general policy when ingesting new datasets is to build a git-annex link to every file in a dataset by default, with a couple of specific exceptions (README.md and DATS.json). However, the utility of building links rather than just storing small files directly in github is questionable, and in tests with the microstructure_informed_connectomics dataset, which contains ~11,300 files, building git-annex links to each file took nearly twice as long as building links only to files larger than a cut-off of 200kb (estimated by manual examination of some subdirectories) and downloading the rest directly.

Do we want to consider size-based or other criteria for which files get git-annex links (such as storing all text files directly) ?

cmadjar commented 2 years ago

This might be tricky for datasets that require third-party accounts since small files can still include data that should not be in the open.

For fully open datasets, I don't see the harm in doing that. @emmetaobrien maybe it could be something to add to the agenda of next week or the week after if we do not have time since we might be doing the roadmap planning?

emmetaobrien commented 2 years ago

Indeed, I was only thinking of this as applying to open datasets.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months.

github-actions[bot] commented 2 years ago

This issue was closed because it has been stalled for 3 months with no activity.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months.

github-actions[bot] commented 1 year ago

This issue was closed because it has been stalled for 3 months with no activity.

github-actions[bot] commented 10 months ago

This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months.

github-actions[bot] commented 4 months ago

This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months.