Is your feature request related to a problem? Please describe.
We are reading https://childproject.readthedocs.io/en/latest/vandam.html to create tsimane2018, a dataset that already exists in some form. I'll be documenting thoughts & suggestions here
Describe the solution you'd like
"The first step is to create a new dataset named vandam-data :" clarify that this needs to be done in terminal, that first they should navigate where they want this to be created, and that this step creates the datalad dataset as such -- so you cannot replace it with just creating the folders "by hand".
"Then, download the original data-set from HomeBank." Clarify that, for your dataset, this step involves identifying the key files that you'll need in the creation (see "Other thoughts")
also, in this section, the user is implicitly given things they need to do on a browser and others on a terminal, without clarifying which are done where
(Overall, I'm feeling that the best thing may be to have video logs of lots of dataset conversions...)
Other thoughts
The tutorial is great but we may want to preface it or postface it with a FAQ about how to get started from a dataset you already have that has a different organization. I'm really not certain. As we've discussed frequently, each dataset is unique, so this may be rather a guide for an expert user helping someone newer do their first import.
Perhaps the structure would be something like:
1) preparation: identify all the files you need (raw recordings, raw metadata, raw annotation files). Note what their structure is - but you don't need to make changes yet
2) Think about the easiest way to proceed: We've found that you probably need to make children.xlsx by hand; and many aspects of recordings.xlsx as well, but you can use ls to list the files that you need to inventorize in the recordings metadata. Consider using excel, where you can have formulas to calculate start time for recordings that have multiple files that need to be concatenated (and note there is a command for calculating duration, so you can create recordings.csv over several steps).
Overall I think this is more of a first tutorial and gives an idea of the steps involved. For a complete guide of creating a dataset, I think it is better to follow the handbook guide.
Is your feature request related to a problem? Please describe. We are reading https://childproject.readthedocs.io/en/latest/vandam.html to create tsimane2018, a dataset that already exists in some form. I'll be documenting thoughts & suggestions here
Describe the solution you'd like
(Overall, I'm feeling that the best thing may be to have video logs of lots of dataset conversions...)
Other thoughts The tutorial is great but we may want to preface it or postface it with a FAQ about how to get started from a dataset you already have that has a different organization. I'm really not certain. As we've discussed frequently, each dataset is unique, so this may be rather a guide for an expert user helping someone newer do their first import.
Perhaps the structure would be something like: 1) preparation: identify all the files you need (raw recordings, raw metadata, raw annotation files). Note what their structure is - but you don't need to make changes yet 2) Think about the easiest way to proceed: We've found that you probably need to make children.xlsx by hand; and many aspects of recordings.xlsx as well, but you can use
ls
to list the files that you need to inventorize in the recordings metadata. Consider using excel, where you can have formulas to calculate start time for recordings that have multiple files that need to be concatenated (and note there is a command for calculating duration, so you can create recordings.csv over several steps).