CogStack / MedCATtutorials

General tutorials for the setup and use of MedCAT.
MIT License
32 stars 15 forks source link

Using different data folder for each part of tutorial #4

Closed mart-r closed 2 years ago

mart-r commented 2 years ago

The rationale behind this PR is as follows:

The "problem" I've described above is, of course a sort of byproduct of me trying to run the tutorials multiple times and commenting out the download parts (which take a long time for me, at least when working from home). If I wasn't trying to do that, there would be no incompatibility.

So my current "solution" is to create a separate folder for each part of the tutorial (i.e data_p2). And on top of that, to not download the files unless they are newer than local (the -N flag).

Overall, the approach I've gone for has a number of pros and cons. I will try to list them here to the best of my abilities. Pros Cons
Downloads are not repeated if file exists locally Extra disk space is required on the user's side (in my case, it took around 6GB total)
Files are not shared between different parts of the tutorial If files get changed locally, they will not automatically be redownloaded

The current approach was to apply the -N flag to all wget downloads. If there are files for which this is not appropriate, these changes would need to be reverted.

PS: The problem arose, as I mentioned above, when I was trying to run the tutorials multiple times. The idea was to see how the tutorials fared with the current master branch of MedCAT. And because I had already downloaded the files in prior runs, I commented them out. This lead to the issue where Part 4.2 would read the file written by Part 3.1 and there would be an incompatibility.

PPS: This may not be the ideal solution to the "problem" at hand. And I'm more than open to other ideas and/or the dismissal of this PR.