Zenysis / Harmony

The Harmony Analytics Platform (Harmony), developed by Zenysis Technologies, helps make sense of messy data by transforming, cleaning and enriching data from multiple sources. https://www.zenysis.com/#harmony
GNU Affero General Public License v3.0
31 stars 14 forks source link

Docker dev environment (for web + pipeline) #78

Closed Sybrand closed 1 year ago

Sybrand commented 1 year ago

Overview

Try it out - let me know what you think. If this works for you. I was going to only do web - but decided to throw in web and pipeline since I didn't hear anything from anyone and had the time.

Details

Test

Follow instructions in README, in "Local development setup" section.

Notes about flow

Flow is an issue. The version of flow we currently have is not compatible with arm64 ; arm64 versions are available (if you install the binary manually, not for some reason using npm!) - but we'd have to make a lot of changes outside the scope of this PR to bring the codebase up to what flow is expecting.

jbinary commented 1 year ago

What is the problem that is solved with this ticket? I mean, I'll need flow and mypy to work anyway, so those should be set up outside of docker thus I still need to set up many things, and what's the matter of having this dockerized thing then?

Sybrand commented 1 year ago

What is the problem that is solved with this ticket? I mean, I'll need flow and mypy to work anyway, so those should be set up outside of docker thus I still need to set up many things, and what's the matter of having this dockerized thing then?

Harmony doesn't currently include flow or mypy - so I've excluded it.

Super easy to add in mypy - not a problem - if Harmony brings it in, just throw it in.

Flow is a different story. Flow doesn't actually have a distribution for arm on linux as far as I can tell - so it may not actually be a relevant option.

As to what problem is the ticket solving - supporting & documenting how to install and run the application on mac (x86 + arm), ubuntu, wsl is very burdensome.

Sybrand commented 1 year ago

What is the problem that is solved with this ticket? I mean, I'll need flow and mypy to work anyway, so those should be set up outside of docker thus I still need to set up many things, and what's the matter of having this dockerized thing then?

Harmony doesn't currently include flow or mypy - so I've excluded it.

Super easy to add in mypy - not a problem - if Harmony brings it in, just throw it in.

Flow is a different story. Flow doesn't actually have a distribution for arm on linux as far as I can tell - so it may not actually be a relevant option.

As to what problem is the ticket solving - supporting & documenting how to install and run the application on mac (x86 + arm), ubuntu, wsl is very burdensome.

Ok - looks like flow and mypy (and translation!) was an omission from the harmony release cut. I'll quickly try add some of it back in this PR - but flow will fail on M1 - not supported. (which gives me an opportunity to say we should drop flow and switch to typescript...)

tahitihat commented 1 year ago

Flagging that I updated the README but intentionally didn't touch the local dev setup section so that merging will be straightforward

Sybrand commented 1 year ago

There’s a lot of comments floating around so to summarize the most pressing items to me. These are all items that work in the current Harmony setup flow even if they’re a little time consuming, so it’s important to maintain that:

  • Hasura isn’t running correctly and so the web server has errors. See #78 (comment)
  • I left commentary on which dev dependencies are required. There’s some that are either just useful commands to be able to run or else I couldn’t figure out what they did; no preference on what to do for those. There’s also an issue where pigz is on that list, but not installing correctly and causing errors when the pipeline runs (see #78 (comment)). There’s also an issue where running minio gives an error (see #78 (comment)).
  • Regarding the DATABASE_URL being hardcoded all over the place, what do you think about switching to use the values that are hardcoded and so those don’t need to be changed? Then in a later refactor, all of that can be cleaned up in Harmony and our repo? See #78 (comment)
  • There’s an error running yarn translations. See #78 (comment)
  • With the inclusion of yarn translations and mypy, both of those commands show changes or errors: translations updates 10 files and mypy has over 400 errors. The translations changes should be fairly quick to include - just review the update files and double check everything looks alright; the changes appear to be about removed enterprise features. The mypy issues seem to be caused by libraries we import not being typed, so I’m guessing it’s caused by mypy.ini not being included in Harmony. However, adding that starts crashing mypy, so I’m not sure if mypy should be split into another PR or stay in here.

And not a critical issue, but most of the provided commands give warnings like

WARN[0000] The "ZEN_ENV" variable is not set. Defaulting to a blank string. 
WARN[0000] The "DEFAULT_DRUID_HOST" variable is not set. Defaulting to a blank string. 

even if I'm running the exact provided command, which is a little confusing.

Re. hasura: Fixed Re. dev dependencies - I think I've addressed them all - added lots of notes in dockerfile for future reference Re. DATABASE_URL - I'm just totally ignoring that right now if that's ok - I made a note for a future PR where we just once and for all address it. Re. Yarn translations - fixed - actual ouput is the job for a future translation Re. Mypy - I'm just going to pull that out - there's all kinds of complexity introduced here (separate PR!)