bridgedb / BridgeDb

The BridgeDb Library source code
https://bridgedb.org/
Apache License 2.0
28 stars 21 forks source link

Use extension .tsv so github, etc. can recognize filetype. #68

Closed ariutta closed 5 years ago

ariutta commented 6 years ago

Take a look at this version of our datasources file. It's formatted as a readable, searchable table! GitHub and tools like Tad recognize the filename extension .tsv but not .txt.

This change will make it much easier to work with this file, both for using it in other programs and for maintaining it. For one example, take a look at our current organisms file. It appears we have tabs between some but not all of the species names.

ariutta commented 6 years ago

@ianwdunlop, would this change be able to be integrated into your code OK?

egonw commented 6 years ago

Sounds reasonable, but this is not the full patch, correct? I mean, we need to update the code accordingly too...

ariutta commented 6 years ago

Sounds reasonable, but this is not the full patch, correct? I mean, we need to update the code accordingly too...

Yes, definitely. I just made those updates. Now each of the following and any references to them are updated:

datasources.txt datasources_headers.txt organisms.txt DataSourceTxt.java datasourcesTxt IdentifiersOrgDataSource.txt DataSourceTxtTest generatedDatasources.txt

The one exception is DataSourceTxt in /UPGRADE_NOTES.md. I'm not sure whether that should just be changed to DataSourceTsv or whether it should be updated to say BioDataSource.init(); or DataSourceTxt.init should be updated to DataSourceTsv.init();.

ariutta commented 6 years ago

And, I like to see this tested, and not sure right now is the best time... maybe after the 2.3 release? Would that be early enough?

Sure, there's no rush.

ariutta commented 6 years ago

I tried to be conservative and so just changed from DataSourceTxt to DataSourceTsv (adjusting each term to match source capitalization). But it might make more sense to use BioDataSource or DataSourcesMetadata so we aren't tied to a specific file format. I've seen one or both of those terms in the codebase already.

egonw commented 5 years ago

Outdated. We're migrating to .tsv.