clulab / reach

Reach Biomedical Information Extraction
Other
97 stars 39 forks source link

Bioresources #734

Closed enoriega closed 3 years ago

enoriega commented 3 years ago

Integrated bioresources into reach. This PR addresses issue #730. It is a copy of the files, reconfiguration of build.sbt and update of the shell scripts to account for the new directory structure.

The commit history is not carried over, but will remain available in the legacy bioresources repository

kwalcock commented 3 years ago

Being less involved in bioresources' past, I don't have a strong opinion about preservation of the history, but perhaps it would be useful to review the repercussions for others. Version control systems in general and git in particular don't seem to do a good job tracking changes across renaming and moving of files (a pet peeve of mine). What will this look like?

MihaiSurdeanu commented 3 years ago

I don't know... But git scares me, and I would prefer to keep it simple. I think it's up to us to document changes in KB file names when they occur. Which means that somebody who wants to track git changes to a file, may need to do an extra step to map the new file to the old history in the original bioresources repo.

bgyori commented 3 years ago

Given that the original files will remain in the Bioresources repo, we can look at the history up to this PR there. We will also be able to track new history in this repo after the move. The only issue is that we can't do e.g., git blame across both the "new" and "old" history of files - not a huge issue I think. Further, many of the files are stored as .tsv.gz, so it's hard to track exact content changes using git tools anyway. So I think moving over the files like this while keeping the original repo available is probably okay.

kwalcock commented 3 years ago

It looks like the relevant bioresources people have weighed in. It's easy for me to press the button. I'll count to 100 first, though, just in case.

MihaiSurdeanu commented 3 years ago

98, 99, ...

kwalcock commented 3 years ago

...in binary. It takes longer.

kwalcock commented 3 years ago

On 109 I remembered that bioresources/CHANGES.md needed to be removed and the main CHANGES.md updated. That will be separate now.

enoriega commented 3 years ago

The remaining tasks are taken care of in PR #735