Closed quadrismegistus closed 7 months ago
Merging #804 (f70e772) into develop (b3d034b) will increase coverage by
0.00%
. The diff coverage is98.60%
.
All the code changes look good, I think we mainly have questions about finalizing field specifics. I'm inclined to get this in front of Josh sooner rather than later while we decide the picky details. Should be straightforward to make remaining field tweaks after that, I think.
The one change I'd like to see before we merge for a first round of testing is to simplify the location export code to use an Address queryset and only use related accounts.
If you find any instances of addresses associated with person instead of account in production data, let's treat it as a data cleanup problem.
In this case, since we want a row per person per address, and if we return simply one Address per one call of get_export_data() -> dict
, and since addresses can contain more than one person, I'm not sure how we can guarantee to get one person per exported dict. There are 53 addresses with more than one person in the associated account.
There are also 3 addresses with 0 persons in the account:
In [8]: for addr in Address.objects.all():
...: if len(addr.account.persons.all())==0:
...: print(f'{len(addr.account.persons.all())} persons in account for address {addr}')
...:
0 persons in account for address 41 avenue du Maine, Paris — Account #320: 41 avenue du Maine
0 persons in account for address 220 boulevard Raspail, Paris — Account #7544: 220 boulevard Raspail
0 persons in account for address 41 rue Censier, Paris — Account #7652: 41 rue Censier
In this case, since we want a row per person per address, and if we return simply one Address per one call of
get_export_data() -> dict
, and since addresses can contain more than one person, I'm not sure how we can guarantee to get one person per exported dict. There are 53 addresses with more than one person in the associated account.
Here's what happens if we use get_object_data per address object:
def get_object_data(self, obj):
"""
Generate dictionary of data to export for a single
:class:`~mep.people.models.Person`
"""
addr = obj
loc = addr.location
persons = addr.account.persons.all()
# required properties
return dict(
# Member
member_id=[person.slug for person in persons],
member_uri=[absolutize_url(person.get_absolute_url()) for person in persons],
# Address data
start_date=addr.partial_start_date,
end_date=addr.partial_end_date,
care_of_person_id=addr.care_of_person.slug if addr.care_of_person else None,
# Location data
street_address=loc.street_address,
city=loc.city,
postal_code=loc.postal_code,
latitude=float(loc.latitude) if loc.latitude is not None else None,
longitude=float(loc.longitude) if loc.longitude is not None else None,
country=loc.country.name if loc.country else None,
arrondissement=loc.arrondissement(),
)
-> that uses semicolons to join the lists:
member_id member_uri care_of_person_id street_address postal_code city arrondissement country start_date end_date longitude latitude
coyle-kathleen;coyle-kestrel-2 https://shakespeareandco.princeton.edu/members/coyle-kathleen/;https://shakespeareandco.princeton.edu/members/coyle-kestrel-2/ 79 rue Denfert-Rochereau 75014 Paris 14 France 1/26/33 2/23/33 2.3343 48.8359
Which is fine but makes it not possible to easily join the csv from members to locations.
There are also 3 addresses with 0 persons in the account:
Thanks for finding. Please run by Josh and check if he knows whether they should be deleted or need to be associated with an account; hopefully he'll remember or be able to figure out. Maybe they were removed from an account but not deleted, or maybe they were duplicates. If need be, we can check if there's anything useful in the admin log entries.
In this case, since we want a row per person per address, and if we return simply one Address per one call of
get_export_data() -> dict
, and since addresses can contain more than one person, I'm not sure how we can guarantee to get one person per exported dict. There are 53 addresses with more than one person in the associated account.
It occurred to me after our last back and forth about this, addresses are like events - they are linked to accounts, not to people, and as you've noted we have a subset of accounts shared by people. My solution in the events dataset was a member_uris
field that included delimited member uris for multi-person accounts - my intent was that labeling this field should make it clear you have to choose how to handle shared accounts. Whatever we do, the events and addresses datasets should work the same way. I think Josh and I documented this pretty clearly in the dataset essay (both events dataset and the fact of shared accounts).
In this case, since we want a row per person per address, and if we return simply one Address per one call of
get_export_data() -> dict
, and since addresses can contain more than one person, I'm not sure how we can guarantee to get one person per exported dict. There are 53 addresses with more than one person in the associated account.It occurred to me after our last back and forth about this, addresses are like events - they are linked to accounts, not to people, and as you've noted we have a subset of accounts shared by people. My solution in the events dataset was a
member_uris
field that included delimited member uris for multi-person accounts - my intent was that labeling this field should make it clear you have to choose how to handle shared accounts. Whatever we do, the events and addresses datasets should work the same way. I think Josh and I documented this pretty clearly in the dataset essay (both events dataset and the fact of shared accounts).
Good point—we already are committed to joining multiple people in events table so for consistency let's just keep doing that. My latest commit is an example of that happening.
We need to customize/add to the JSON+CSV export commands we now have (
export_members
,export_books
,export_events
) to reflect the following:A new genre field called
category
on books, doable by modifyingexport_books
. No issue yetNew nationality and gender info on authors/creators, and indeed all info on creators, doable by writing a
export_creators
command that ideally subclasses/inherits most of the logic from theexport_members
command, since both draw on the samePerson
table. See #684New time-sensitive location information about who lived where when, doable by a new separate command,
export_locations
. See #790This PR has a minimal working version of (2) using the approach described there.