Can't use StripTransform in .yml file

OceanDataTools / openrvdas

An open source data acquisition system designed for use on research vessels and other scientific installations

http://openrvdas.org

Other

39 stars 20 forks source link

Can't use StripTransform in .yml file #391

Closed veggiemike closed 3 months ago

veggiemike commented 3 months ago

Ahoy @davidpablocohn!

I just realized I cannot use a StripTransform in v1.9.0. It looks like it's never imported by listen.py. A quick audit revealed the following lines appear to be missing from the top of listen.py... Any reason any of these were excluded (e.g., they don't actually work)?

from logger.transforms.delta_transform import DeltaTransform
from logger.transforms.derived_data_transform import DerivedDataTransform
from logger.transforms.format_transform import FormatTransform
from logger.transforms.interpolation_transform import InterpolationTransform
from logger.transforms.nmea_checksum_transform import NMEAChecksumTransform
from logger.transforms.regex_replace_transform import RegexReplaceTransform
from logger.transforms.select_fields_transform import SelectFieldsTransform
from logger.transforms.split_transform import SplitTransform
from logger.transforms.strip_transform import StripTransform
from logger.transforms.subsample_transform import SubsampleTransform

I've imported all of them on my server and things seem to work, although I've honestly only tested StripTransform.

webbpinner commented 3 months ago

The way I've handled this in the past is to add the module arg when invoking a reader/transform/writer that's not already imported. i.e.

transforms:
    - class: SplitTransform
      module: logger.transforms.split_transform

davidpablocohn commented 3 months ago

Hi Mike!

Good catch. This is basically me being inconsistent in design philosophy and driving down the middle of the road. When I originally wrote listen.py, I had it import every reader/writer/transform under the sun. Then, as transforms multiplied, I felt like it started to get cumbersome - should it really include everything, even though most of it won't ever be used? How much does that slow down startup (importing, reading, parsing all that extra code)? So figured that, beyond the original set, I'd just count on folks specifying where each component lived, using the 'module' specification, as Webb describes above.

But I realize I've not made that explicit in the documentation.

Would love your thoughts (and Webb's) on which way to go on this.

On Sun, May 19, 2024 at 4:39 AM Webb Pinner @.***> wrote:

The way I've handled this in the past is to add the module arg when invoking a reader/transform/writer that's not already imported. i.e.

transforms:

class: SplitTransform module: logger.transforms.split_transform

— Reply to this email directly, view it on GitHub https://github.com/OceanDataTools/openrvdas/issues/391#issuecomment-2119204156, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFO7V3UKEZ5KX7CO34YGHYTZDCFPFAVCNFSM6AAAAABH6E653KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJZGIYDIMJVGY . You are receiving this because you were mentioned.Message ID: @.***>

webbpinner commented 3 months ago

Would something like this work: Use the reader/transforms/writers/ __init__.py files to import all the readers/transforms/writers. i.e.

#logger.readers.__init__.py

from . import cached_data_reader
from . import logfile_reader
from . import network_reader
from . import tcp_reader
from . import udp_reader
from . import redis_reader
from . import serial_reader
from . import text_file_reader
from . import database_reader
from . import composed_reader

Then listen.py would just:

import readers from logger.readers
import transforms from logger.transforms
import writers from logger.writers

This approach would require any new readers/transforms/writers to be added to their corresponding __init__.py files when they are added to the repository but other files wanting to import all available readers/transforms/writers wouldn't have to change.

This doesn't take into account performance ramifications.

veggiemike commented 3 months ago

The way I've handled this in the past is to add the module arg when invoking a reader/transform/writer that's not already imported. i.e.
transforms:
    - class: SplitTransform
      module: logger.transforms.split_transform

That hadn't occurred to me at first. In my mind that was just for local additions, like module: local.nautilus.UglyTransform.

veggiemike commented 3 months ago

Would something like this work: Use the reader/transforms/writers/ __init__.py files to import all the readers/transforms/writers. i.e.
#logger.readers.__init__.py

from . import cached_data_reader
from . import logfile_reader
from . import network_reader
from . import tcp_reader
from . import udp_reader
from . import redis_reader
from . import serial_reader
from . import text_file_reader
from . import database_reader
from . import composed_reader
Then listen.py would just:
import readers from logger.readers
import transforms from logger.transforms
import writers from logger.writers
This approach would require any new readers/transforms/writers to be added to their corresponding __init__.py files when they are added to the repository but other files wanting to import all available readers/transforms/writers wouldn't have to change.

This doesn't take into account performance ramifications.

My kneejerk reaction is that I'd hate to have to specify the module all over the place in the .yml files for "built in" transforms, readers, or writers. (I haven't tested this, do I need to do it everywhere or just the first occurnce?) I feel like as a writer of a .yml file, I shouldn't have to know in advance which modules are imported for me already... I should be able to assume all the supplied modules are ready to use.

I like the idea of controlling it via the __init__ files instead of groups of imports in listen.py, if that can be done with minimal effort. I actually was surprised to see them all empty w/out any import or prep code in them.

Regarding performance, I'm unsure, but loading modules that don't have much/any init code in them shouldn't be that impactful. I haven't bumped into any dynamic code that runs at module load-time in any of the files of poked at. Once everything gets loaded once, and .pyc files created, I'd view it as as close to free as you can get with Python. Now, if we start trying to dynamically load things by globbing on the filesystem, that would get slow (although again, I don't know how slow).

So, I personally think we should either just add the missing imports to listen.py (with the intent to always add new imports three when we add new classes) or do it in a series of __init__ files, whichever is easier. I should mention, I did not double-check Reader/Writers when I came up with that list of missing Transforms.

davidpablocohn commented 3 months ago

Okay - I'm going to have a go at implementing the init.py solution. Will keep you posted.

On Sun, May 19, 2024 at 1:48 PM Michael D Labriola @.***> wrote:

Would something like this work: Use the reader/transforms/writers/ init.py files to import all the readers/transforms/writers. i.e.

logger.readers.init.py

from . import cached_data_reader from . import logfile_reader from . import network_reader from . import tcp_reader from . import udp_reader from . import redis_reader from . import serial_reader from . import text_file_reader from . import database_reader from . import composed_reader

Then listen.py would just:

import readers from logger.readers import transforms from logger.transforms import writers from logger.writers

This approach would require any new readers/transforms/writers to be added to their corresponding init.py files when they are added to the repository but other files wanting to import all available readers/transforms/writers wouldn't have to change.

This doesn't take into account performance ramifications.

My kneejerk reaction is that I'd hate to have to specify the module all over the place in the .yml files for "built in" transforms, readers, or writers. (I haven't tested this, do I need to do it everywhere or just the first occurnce?) I feel like as a writer of a .yml file, I shouldn't have to know in advance which modules are imported for me already... I should be able to assume all the supplied modules are ready to use.

I like the idea of controlling it via the init files instead of groups of imports in listen.py, if that can be done with minimal effort. I actually was surprised to see them all empty w/out any import or prep code in them.

Regarding performance, I'm unsure, but loading modules that don't have much/any init code in them shouldn't be that impactful. I haven't bumped into any dynamic code that runs at module load-time in any of the files of poked at. Once everything gets loaded once, and .pyc files created, I'd view it as as close to free as you can get with Python. Now, if we start trying to dynamically load things by globbing on the filesystem, that would get slow (although again, I don't know how slow).

So, I personally think we should either just add the missing imports to listen.py (with the intent to always add new imports three when we add new classes) or do it in a series of init files, whichever is easier. I should mention, I did not double-check Reader/Writers when I came up with that list of missing Transforms.

— Reply to this email directly, view it on GitHub https://github.com/OceanDataTools/openrvdas/issues/391#issuecomment-2119355429, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFO7V3WQFT6SFYTGDJUSISLZDEFZ7AVCNFSM6AAAAABH6E653KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJZGM2TKNBSHE . You are receiving this because you were mentioned.Message ID: @.***>

veggiemike commented 3 months ago

Okay - I'm going to have a go at implementing the init.py solution. Will keep you posted.

Cool! FYI, I haven't noticed any performance difference explicitly importing all the transform modules via listen.py

davidpablocohn commented 3 months ago

The one change I've had to make (branch issue_391) is that instead of Webb's

from . import cached_data_reader from . import logfile_reader

I'm having to do

from .cached_data_reader import CachedDataReader from .composed_reader import ComposedReader

And in listen.py I've gone with

from logger.readers import *

Otherwise, in the code (and configs) you have to invoke it as cached_data_reader.CachedDataReader. I don't see a clean way to do it without "*" imports.

Do either of you have ideas on a cleaner way to do it, or shall I go with this?

On Tue, May 21, 2024 at 2:56 PM Michael D Labriola @.***> wrote:

Okay - I'm going to have a go at implementing the init.py solution. Will keep you posted.

Cool! FYI, I haven't noticed any performance difference explicitly importing all the transform modules via listen.py

— Reply to this email directly, view it on GitHub https://github.com/OceanDataTools/openrvdas/issues/391#issuecomment-2123504537, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFO7V3WO3ZQKRBW5CBIAJGLZDO7IVAVCNFSM6AAAAABH6E653KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRTGUYDINJTG4 . You are receiving this because you were mentioned.Message ID: @.***>

veggiemike commented 3 months ago

The one change I've had to make (branch issue_391) is that instead of Webb's from . import cached_data_reader from . import logfile_reader I'm having to do from .cached_data_reader import CachedDataReader from .composed_reader import ComposedReader And in listen.py I've gone with from logger.readers import Otherwise, in the code (and configs) you have to invoke it as cached_data_reader.CachedDataReader. I don't see a clean way to do it without "" imports. Do either of you have ideas on a cleaner way to do it, or shall I go with this?

I was afraid of that. I did a quick little test myself the other day, couldn't get it right, and just figured my skills were rusty.

If we can't get "*" imports working, and we're left having to invoke as cached_data_reader.CachedDataReader, that's going to break every yaml config file in the universe and make them all that much harder to read... I'd rather live with having the list of import statements in the not-obvious listen.py.

davidpablocohn commented 3 months ago

It works fine with "from logger.readers import " - it's just that importing "" is considered poor form.

On Tue, May 21, 2024 at 4:12 PM Michael D Labriola @.***> wrote:

The one change I've had to make (branch issue_391) is that instead of Webb's from . import cached_data_reader from . import logfile_reader I'm having to do from .cached_data_reader import CachedDataReader from .composed_reader import ComposedReader And in listen.py I've gone with from logger.readers import Otherwise, in the code (and configs) you have to invoke it as cached_data_reader.CachedDataReader. I don't see a clean way to do it without "" imports. Do either of you have ideas on a cleaner way to do it, or shall I go with this?

I was afraid of that. I did a quick little test myself the other day, couldn't get it right, and just figured my skills were rusty.

If we can't get "*" imports working, and we're left having to invoke as cached_data_reader.CachedDataReader, that's going to break every yaml config file in the universe and make them all that much harder to read... I'd rather live with having the list of import statements in the not-obvious listen.py.

— Reply to this email directly, view it on GitHub https://github.com/OceanDataTools/openrvdas/issues/391#issuecomment-2123576350, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFO7V3XQHMOKQQDUTQT46FLZDPIGTAVCNFSM6AAAAABH6E653KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRTGU3TMMZVGA . You are receiving this because you were mentioned.Message ID: @.***>

veggiemike commented 3 months ago

It works fine with "from logger.readers import " - it's just that importing "" is considered poor form.

I misunderstood. This seems like a perfectly legit use case for import * to me. I'd just do it. :-)

davidpablocohn commented 3 months ago

Will do! Running tests now...

On Tue, May 21, 2024 at 5:19 PM Michael D Labriola @.***> wrote:

It works fine with "from logger.readers import " - it's just that importing "" is considered poor form.

I misunderstood. This seems like a perfectly legit use case for import * to me. I'd just do it. :-)

— Reply to this email directly, view it on GitHub https://github.com/OceanDataTools/openrvdas/issues/391#issuecomment-2123629578, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFO7V3UAV5TP7N5L35QTNILZDPQBDAVCNFSM6AAAAABH6E653KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRTGYZDSNJXHA . You are receiving this because you were mentioned.Message ID: @.***>

davidpablocohn commented 3 months ago

Pushed to master cbb5638..4b0b5cc