slimreaper35 commented 2 months ago

Maintainers will complete the following section

[ ] Commit messages are descriptive enough
[ ] Code coverage from testing does not decrease and new code is covered
[ ] Docs updated (if applicable)
[ ] Docs links in the code are still valid (if docs were updated)

Note: if the contribution is external (not from an organization member), the CI pipeline will not run automatically. After verifying that the CI is safe to run:

approve GitHub Actions workflows by clicking a button
approve the Red Hat Trusted App Pipeline container build by commenting /ok-to-test (as is the standard for Pipelines as Code)

slimreaper35 commented 2 months ago

New push:

I changed the parsing logic completely, dropped vendored third-party package, and used the bundler standard library
- Missing unit tests + need to add bundler to cachi2 image
We should choose one python name for Gemfile.lock (gemlock, gemfile_lock, lockfile, gemlock_file , etc.) that only will be used
We should choose one: path or subpath in the context of path dependencies

slimreaper35 commented 1 month ago

New push:

I dropped purl properties, which are leftover from high-level code
I dropped urlparse since there is no validation of URL at all, urlparse accepts empty strings btw I wouldn't rely on the bundler library here, neither
Instead there is AcceptedUrl type derived from pydantic
typo fix + used **dep based on Alexey's suggestion

eskultety commented 1 month ago

* I dropped `urlparse` since there is no validation of URL at all, `urlparse` accepts empty strings btw
  I wouldn't rely on the bundler library here, neither

Good to know, anything in particular to share?

slimreaper35 commented 1 month ago

Good to know, anything in particular to share?

Yeah, forgot to add examples. I was testing our lockfile_parser.rb script what kind of malformed URLs it can handle, and:

http is supported by bundler, well okay
but remote: //github.com/3scale/json-schema.git is valid, also remote: hello is apparently valid
however, the example you provided https://gi#t%hub.com/3scale/json-schema.git does not pass

eskultety commented 1 month ago

FYI the unit tests fail in local environments if Ruby isn't installed which is currently assumed as a hard dependency and it should not be. Therefore, there's something going on with the gating CI, because it clearly has Ruby installed which is wrong, it should be a completely pristine environment.

Sigh. I now see the shortcoming of the solution based on the Ruby script. However, I'm still in favour of it due to its simplicity and clarity over the original solution based on some random 3rd party library (or even worse our own vendored version of it) which most likely will end up as an abandonware . I'd solve the unit test issue with the following:

@pytest.mark.skip("Gemfile.lock parser is integration tested due to its dependency on the Ruby runtime")
def test_gemfile_parse() -> None:
    pass

in order to:

not lose coverage otherwise codecov will complain in its reports that we have an uncovered function which would not be true
we need to skip it so that we don't report success on something that doesn't do anything.

Any objections or alternative proposals?

brunoapimentel commented 1 month ago

there's something going on with the gating CI

Ohh, good catch!

I'd solve the unit test issue with the following:
@pytest.mark.skip("Gemfile.lock parser is integration tested due to its dependency on the Ruby runtime")
def test_gemfile_parse() -> None:
    pass
Any objections or alternative proposals?

We don't we simply mock the run_cmd call, and test the rest of the function with mocked output? I assume we're fine with not unit testing the lockfile_parser.rb per se, so this would allow us to test all the rest of the Python code in bundler/parser.py.

eskultety commented 1 month ago

@pytest.mark.skip("Gemfile.lock parser is integration tested due to its dependency on the Ruby runtime")
def test_gemfile_parse() -> None:
    pass

Taking this idea back - integration testing this is not a way. We'd still have to clone a repo with a particular ref just to test these, which doesn't feel right.

We don't we simply mock the run_cmd call, and test the rest of the function with mocked output? I assume we're fine with not unit testing the lockfile_parser.rb per se, so this would allow us to test all the rest of the Python code in bundler/parser.py.

We could do that, yes. Then again, it's our script that we're going to maintain, so we should test its correctness as well. Continuing with your thought we could have this particular test that mocks run_cmd and tests the rest of the python code as you suggest and then we could have an additional test for the Ruby parser which would require Ruby installed in some form which is an idea I discussed privately with @a-ovchinnikov:

we ought not to rely on a particular interpreter to be installed in our unit test suite and so for that proposed test
we'd introduce a new test env variable CACHI2_TEST_EXPENSIVE which would trigger this test (and potentially others which might take longer to execute and so by default it would be disabled)
in such a test (we're still talking unit tests) we'd add a check whether we're running in a GH action environment, which has Ruby installed, and execute the script directly; in local dev envs we'd pull a Ruby (Alpine) image and execute the test in a container environment

That said, if everyone is in favour of including expensive tests as well, we can track that idea as a future improvement, doesn't necessarily need to be part of this particular PR.

slimreaper35 commented 1 month ago

I just added a commit that skips the tests that depend on Ruby. I did not want to mock the run_cmd output with huge JSON output because it is, in the end, a temporary change.

eskultety commented 1 month ago

I just added a commit that skips the tests that depend on Ruby. I did not want to mock the run_cmd output with huge JSON output because it is, in the end, a temporary change.

Doesn't matter. You can define how big you want the JSON (I think you can take a very simple example like 2 dependencies and it won't be that big, will it?), we just want to make sure we cover ideally all python code we introduce. I don't care that much about the output being mocked for now, but I do care about testing our behaviour and agree with @brunoapimentel we should do it.

slimreaper35 commented 1 month ago

Should I add the change as a separate commit or part of the one that implements parsing? I thought that adding a separate commit would make it easy to revert it in the future.

eskultety commented 1 month ago

Should I add the change as a separate commit or part of the one that implements parsing? I thought that adding a separate commit would make it easy to revert it in the future.

Same commit, with a justification. We will not revert it, we'll just update the test in the future such that it:

doesn't mock output of the parser anymore
only runs by default in the gating CI and
is guarded by an environment variable that'd be disabled by default in local development envs

a-ovchinnikov commented 1 month ago

/retest

containerbuildsystem / cachi2

bundler: parse Gemfile.lock #599

Maintainers will complete the following section