Living-with-machines / lwmdb

A django-based library for managing the Living with Machines newspapers metadata database schema
https://living-with-machines.github.io/lwmdb/
MIT License
2 stars 0 forks source link

Fix mitchells management command #99

Closed griff-rees closed 1 year ago

griff-rees commented 1 year ago

A core issue, which may actually be applicable to Census, Gazetteer and Newspaper commands, is inheritance from the custom Fixture class, which is currently in lwmdb.management.commands.fixtures.

griff-rees commented 1 year ago

A core necessity is providing a handler method, which seems to be included in some capacity but it doesn't pass the test. This is after an initial attempt to refactor the mitchells Command following the spec here: https://docs.djangoproject.com/en/4.2/howto/custom-management-commands/#module-django.core.management

self = <mitchells.management.commands.mitchells.Command object at 0xffff92b2e690>
args = ()
options = {'force_color': False, 'no_color': False, 'pythonpath': None, 'settings': None, ...}

    def handle(self, *args, **options):
        """
        The actual logic of the command. Subclasses must implement
        this method.
        """
>       raise NotImplementedError(
            "subclasses of BaseCommand must provide a handle() method"
        )
E       NotImplementedError: subclasses of BaseCommand must provide a handle() method

Prior to renaming, attempting to run the mitchells command directly raised a AttributeError: 'module' object has no attribute 'Command' error.

griff-rees commented 1 year ago

I think the core issue relates to inheriting from the Fixtures class. Any thoughts on this appreciated @kallewesterling

kallewesterling commented 1 year ago

I'll try to look at it later today/tomorrow, if that's OK?

kallewesterling commented 1 year ago

Sorry - accidentally closed!

griff-rees commented 1 year ago

No worries. I should probably push my local branch before then, will try to get that up before the meeting.

kallewesterling commented 1 year ago

Definitely! Realistically though, I'll probably only be able to return to this in the morning so you can do it at the end of today if you want :)

griff-rees commented 1 year ago

See the fix-mitchells-import branch for progress.

griff-rees commented 1 year ago

I think the error is actually in fixtures.py, see my comment below. There's no NLP column in the current configuration, hence those errors. How that fixture is generated is what I'm trying to clock, I'm guessing that's what's next to solve.

https://github.com/Living-with-machines/lwmdb/blob/fix-mitchells-import/lwmdb/management/commands/fixtures.py#L197

for _, row in df.iterrows():
      if not Newspaper.objects.filter(publication_code=row.NLP):
          self.stdout.write(
              self.style.WARNING(
                  f"NLP {row.NLP} not found in database => failed connection to   mitchells.Publication {row.entry}."
              )
          )
          continue
kallewesterling commented 1 year ago

I wrote a separate script in alto2txt2fixture, as I think that makes more sense (since it all needs to be connected up):

https://github.com/Living-with-machines/alto2txt2fixture/blob/main/create_adjacent_tables.py

It should be runnable after installing prerequisited using poetry.

It'll download any necessary files (including one file that is created by alto2txt, Newspaper-1.json)

The files are divided into these categories, which is good to remember:

image

The script create_adjacent_tables.py produces a number of outputs:

image

Each of the files correspond with an app + model (i.e. gazetteer.Place is expected to have a list of the items for the gazetteer app's Place model).

The CSV contains structured data for the table:

image

The JSON contains the same data, but structured for easy Django import:

image

Foreign keys should be set correctly. See this example for gazetteer.Place, which links to

image

ManyToMany relations are also defined, like in the mitchells.Entry model -- see political_leanings and prices values:

image

This should work, for import with the JSON files that I created in the latest generating (May 2023).

A few pending questions:

griff-rees commented 1 year ago

Thanks so much @kallewesterling, down to the detailed screen shots, multiple file formats and details down to the file name structure! I'll get a version of this running locally and update you.

Given the urgency: I could make https://github.com/Living-with-machines/alto2txt2fixture/ a dependency even while it's still a private repo. Could try to release that publicly in the future.

kallewesterling commented 1 year ago

I can see if I can get that opened publicly today if you'd like. I don't think there's anything stopping that, really, but I'll double-check.

griff-rees commented 1 year ago

Yeah that would be lovely. Makes releasing lwmdb easier too! Can try to get that up on PyPI for you as well.

kallewesterling commented 1 year ago

It's open!

griff-rees commented 1 year ago

I think this can now be closed given mitchells is importing. There may be further issues with the import but the pipeline now works.