Closed dtgupta closed 6 months ago
The PR is well done; thanks! However, while reviewing the code I saw that you're using addresses[0]
, so this should come from a one-to-many relationship. I then looked at the code, and saw that there is already a ror_addresses
table containing a city
field. Am I missing something?
Indeed, it may look as if the addresses field has a one-to-many relationship with multiple other addresses, but it's deceiving. I have looked at the ROR dataset and none of the organizations has a 2nd element under addresses.
I added city
as a field to simply querying for the process I'm working on. The database cursor is unable to perform JOIN
operation to get the city
field from ror_addresses
table. So it would require an additional time to narrow down the search results while matching. If you're still unsure about this, I am planning on creating another PR with the process soon, you can merge this branch after viewing the changes.
You are absolutely right:
$ jq '.[].addresses|length' v1.39-2023-12-20-ror-data.json | sort -u
1
In this case all the elements of the addresses
table, not just city
should be folded into research_organizations
, probably with field names prefixed by address_
. A comment should indicate why this is the case.
Yes, that is a good idea! I have folded the addresses of research organizations into the main table, deleted the ror_addresses
table and updated the tests. The previous commit reflects these changes.
Well done!
Add city field to research_organizations table and update corresponding tests. Adding city field to this table would make querying affiliations from specific cities efficient.