SciCatProject / scicat-filewriter-ingest

Python client that connects to a kafka queue and creates new datasets when receiving notification that a file has been written
0 stars 0 forks source link

Refactoring filewriter ingestor. #2

Closed YooSunYoung closed 4 months ago

YooSunYoung commented 4 months ago

Brainstorming what should be done for refactoring. Feel free to shoot down the ideas : D

Encapsulating partial code into modules/functions

For example, these can be split into another module...? https://github.com/SciCatProject/scicat-filewriter-ingest/blob/3c5c22ea4e606719e55b7999e407f18470b4e76c/online_ingestor.py#L116-L124

Kafka Consumer/Message Handling

As it is carrying the core-logic, we maybe better wrap the confluent kafka interfaces so that we can easily test them as well...? For example, https://github.com/SciCatProject/scicat-filewriter-ingest/blob/3c5c22ea4e606719e55b7999e407f18470b4e76c/online_ingestor.py#L71-L77 This kind of error-handling can be separated into a smaller function. Or Wrap this into a function or automate this somehow...? https://github.com/SciCatProject/scicat-filewriter-ingest/blob/3c5c22ea4e606719e55b7999e407f18470b4e76c/online_ingestor.py#L32-L40

Logging

Logging seems like to require some configuration steps but despite of the configuration the logging interface logger.log(info/debug/error) should be the same. But since the configuration of the logger requires file or link handling, it'd better be tested I guess...?

Ingestor

This guy definitely seems like to need some chopping. https://github.com/SciCatProject/scicat-filewriter-ingest/blob/3c5c22ea4e606719e55b7999e407f18470b4e76c/ingestor_lib.py#L222

For example, I think this should be wrapped into a separate file handling interface so that we can test them. https://github.com/SciCatProject/scicat-filewriter-ingest/blob/3c5c22ea4e606719e55b7999e407f18470b4e76c/ingestor_lib.py#L258-L261

Redefining Interfaces

Nexus structure path

https://github.com/SciCatProject/scicat-filewriter-ingest/blob/3c5c22ea4e606719e55b7999e407f18470b4e76c/ingestor_lib.py#L28-L33

This mixture of filter and indexing can maybe refactored into more explicit way...? But I also think it is already working conveniently already... hmm...

Ingestor

Documentation / Docstrings

Maybe we can collect the comments into docstrings. Copilot is good at it : D And the documentation in the readme might be sufficient but we probably need to clean that up too. Here is a list of some sections that might be needed in the documentation.

YooSunYoung commented 4 months ago

It is now turned into a milestone https://github.com/SciCatProject/scicat-filewriter-ingest/milestone/1