dspinellis / alexandria3k

Local relational access to openly-available publication data sets
GNU General Public License v3.0
79 stars 14 forks source link

Add city field to research_organizations table #34

Closed dtgupta closed 6 months ago

dtgupta commented 6 months ago

Add city field to research_organizations table and update corresponding tests. Adding city field to this table would make querying affiliations from specific cities efficient.

dspinellis commented 6 months ago

The PR is well done; thanks! However, while reviewing the code I saw that you're using addresses[0], so this should come from a one-to-many relationship. I then looked at the code, and saw that there is already a ror_addresses table containing a city field. Am I missing something?

dtgupta commented 6 months ago

Indeed, it may look as if the addresses field has a one-to-many relationship with multiple other addresses, but it's deceiving. I have looked at the ROR dataset and none of the organizations has a 2nd element under addresses.

I added city as a field to simply querying for the process I'm working on. The database cursor is unable to perform JOIN operation to get the city field from ror_addresses table. So it would require an additional time to narrow down the search results while matching. If you're still unsure about this, I am planning on creating another PR with the process soon, you can merge this branch after viewing the changes.

dspinellis commented 6 months ago

You are absolutely right:

$ jq '.[].addresses|length' v1.39-2023-12-20-ror-data.json  | sort -u
1

In this case all the elements of the addresses table, not just city should be folded into research_organizations, probably with field names prefixed by address_. A comment should indicate why this is the case.

dtgupta commented 6 months ago

Yes, that is a good idea! I have folded the addresses of research organizations into the main table, deleted the ror_addresses table and updated the tests. The previous commit reflects these changes.

dspinellis commented 6 months ago

Well done!