~If the open-data repo is forked or cloned, the submodules are not, but referred?~ In the test, the submodule command copied the full master branch of the target repository
There were a few things that would make this approach hard to work if there is code-dependency between repositories, but this is NOT the case of our requirements
No duplication
~Risk of losing data if the third-party repository is deleted or moved~ No if no update of the repository is made on our side after the deleted third party repository has been removed
Might not solve the problems with large files, just move it somewhere else in Github?
Not recommended for creating the main folders of the different datasets to be produced, but rather adding subfolders within the dataset main folder pointing to user projects (ie. github repositories) using the corresponding dataset
The idea was brought by @erictleung.
Additional information:
tested repo: a personal repo about data analysis of fCC data ---> TEST: see https://github.com/evaristoc/open-data
Main Results:
submodule
command copied the full master branch of the target repository