dchaplinsky / thebeast

MIT License
3 stars 2 forks source link

Resolver to generate abstract entity id #14

Open yharahuts opened 1 year ago

yharahuts commented 1 year ago

Let's say we have two csv files and two mappings: one for companies (company_num, name, address) and one for directors (director_id,name,company_num) - a standart situation for a mysql export from diffirent tables.

When generating entities from them, we can't link director to company - simply because we can't get company entity_id from directors mapping (and vice versa).

The only way for now - is to create two entities: one 'full' from companies file (having all company fields - name, address, etc) and one 'stub' from directors file (containing only company_num since it's only one available there) - and then merge them outside Beast.

My proposal is to create a resolver to generate and return entity_id without generating entity.

E.g., for companies file we will have:

# just a simple Company entity
entities:
  company:
    schema: Company
    keys:
      - entity.registrationNumber
      # lets assume we will get a key '1234abcd' from this input
    properties:
      registrationNumber:
        column: company_num
      name:
        column: name
      address:
        column: address

And for directors we will have:

entities:
  person:
    schema: Person
    keys:
      - entity.name
    properties:
      name:
        column: name
      # ... and other person's fields we have

  directorship:
    schema: Directorship
    keys:
      - entity.organization
      - entity.director
    properties:
      director: # we can link director by entity resolver
        entity: person
      organization: # but we can't link an organization - because we creating it's entity in other file/mapping
        our_custom_entity_id_resolver: # instead we will use our resolver
          - column.company_num # and give it list of columns to generate entity_id
                               # this will create entity_id for without creating an entity, 
                               # having same column/value as in company file will yield us same id '1234abcd'
                               # thus allowing us to link the entity from other file

I'm not sure whether this is a good idea, need you thoughts @dchaplinsky

dchaplinsky commented 1 year ago

Well, there are two more ways.

  1. Generate entity fragment from the directors file of type Company, which will only have an entity id. It'll generate a one statement and the key hashed the same way as the company from the company table which you can reference from the person. So for the same company you'll have two entities (one is very shallow, key only) and the second is the full one (and their keys will be the same as long as you following the same pattern for the surrogate key).
  2. Option 1 but with the flag for that fragmented entity, like virtual=True. So you can use that entity to reference to the company but it won't be exported.
yharahuts commented 1 year ago

First option will work only for statements export I guess?

Opt 2 is really nice, I like it.