casework / CASE-Mapping-Python

Apache License 2.0
0 stars 3 forks source link

Add modules utils under mix_utils folder to avoid duplication in JSON… #36

Closed fabrizio-turchi closed 6 months ago

fabrizio-turchi commented 6 months ago

Add the utils.py module under mix_utils folder to: 1) avoid duplication in the JSON/CASE file generated by the parsers (UFED, AXIOM etc.) 2) amends the data to either make it homogeneous or get rid of some dirty characters extracted from the XML reports.

The module is shared among all parsers (UFED, AXIOM, OXYGEN and XAMN/XRY)

ajnelson-nist commented 6 months ago

Hi @fabrizio-turchi ,

If you are in progress on pushing changes, please leave this PR in Draft status until you are done and ready for it to be reviewed for merging.

Also, your first commit includes a compiled .pyc file. Can you please git rm that, and also include a patch updating .gitignore to ignore __pycache__ and *.pyc? I leave it up to you if you want to revise the history to purge that file.

fabrizio-turchi commented 6 months ago

Hi Alex,

I'll remove the .pyc files and add them to the .gitignore, sorry it slipped my mind!

I stop working on the repo for the moment, so no worry I'll wait for your review.

Best regards.

F.

[cid:0a7159de-aa31-483f-a0a1-da5a8c203905] [facebook]https://www.facebook.com/CNRsocialFB [twitter] https://twitter.com/CNRsocial_ [instagram] https://www.instagram.com/cnrsocial/ [linkedin] https://www.linkedin.com/company/283032 Fabrizio Turchi CNR, ISTITUTO DI INFORMATICA GIURIDICA E SISTEMI GIUDIZIARI [cid:69336b52-6f6a-4130-8edf-0cd2e5bc5fc6] Tel. +39 055 4399672 Cell. +39 328 36 23 632 @.*** via de' Barucci, 20, 50121 – Firenze [cid:856ad76e-5a6b-4d2f-bf5a-0b6b2254710b] www.cnr.ithttp://www.cnr.it/ Devolvi il 5×1000 al CNR CF 80054330586


From: Alex Nelson @.> Sent: 20 February 2024 14:43 To: casework/CASE-Mapping-Python @.> Cc: Fabrizio Turchi @.>; Mention @.> Subject: Re: [casework/CASE-Mapping-Python] Add modules utils under mix_utils folder to avoid duplication in JSON… (PR #36)

Hi @fabrizio-turchihttps://github.com/fabrizio-turchi ,

If you are in progress on pushing changes, please leave this PR in Draft status until you are done and ready for it to be reviewed for merging.

Also, your first commit includes a compiled .pyc file. Can you please git rm that, and also include a patch updating .gitignore to ignore pycache and *.pyc? I leave it up to you if you want to revise the history to purge that file.

— Reply to this email directly, view it on GitHubhttps://github.com/casework/CASE-Mapping-Python/pull/36#issuecomment-1954245633, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEDCCDIQTCGO3XUHSUB2TVLYUSSBJAVCNFSM6AAAAABDPPLSSCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJUGI2DKNRTGM. You are receiving this because you were mentioned.Message ID: @.***>

fabrizio-turchi commented 6 months ago

@ajnelson-nist I turned the class into a function at model level. The test_duplicate.py contains all the ways I invoke the code in the UFED parser. The JSON objects taken into account in the test are a simplified version of the ones included in the parser code but it doesn't affect the result of the test. Feel free to make any kind of comment for the benefit of the final result.

ajnelson-nist commented 6 months ago

I realized a few things running pytest locally. The GitHub Action isn't running on this pull request; it seems we'd forgotten to turn on Actions for pull requests. I'm pushing that change to your main now.

Also, thank you again for using the typing symbols from Python <3.9. But, I'd forgotten that 3.8 is not supported by this package. The CASE validation action only operates in 3.9 or later, and the pyproject.toml file already specifies needing 3.9.

Thank you for the pytest. I was expecting it would operate on your new module as a package, but where you put the test made me realize your code isn't positioned to be a part of the cdo_mapping package. With where the code is, you and any end-users would only be able to use it by GIt submodule incorporation. The mix_utils directory would need to move to under /case_mapping/ to be exposed as part of the package.

The way to see this is to have the file test_duplicate.py move to under /tests, and then get its import line working again. My understanding of your intended use case is that your new module should be usable with an import like how /example.py has the line import case_mapping.

Last, be aware the node identifier style resolves to "@id": str(uuid.uuid4()), and without a @base statement in a graph-wide context dictionary, that attempted @id statement will not work as JSON-LD. (I forget if it will be silently dropped because the UUID is just a string, or if it will break in some other way, but case_validate would fail validation on the generated JSON because, at best, all of the nodes would be blank nodes.) I appreciate this test is designed to not think about the entire graph, but the @ids should at least read (after all the variables are resolved) "@id": "http://example.org/kb/" + str(uuid.uuid4()). kb: could also be fine as a prefix, as that's what's used almost everywhere across the CASE examples.

fabrizio-turchi commented 6 months ago

@ajnelson-nist I tried to use the example module to test the FacetUrlHistory and UrlHistoryEntry classes, but it raises the error: "PackageNotFoundError: case-mapping". Inside the init.py of the package there's a reference to case-mapping but the folder is case_mapping. Could you have a look to it?

ajnelson-nist commented 6 months ago

@fabrizio-turchi : I have a guess why you're seeing that, but first, it would be helpful for review purposes if you would run make in your development environment. This will install pre-commit, by running this recipe. Then if you make any unimportant change to case_mapping/uco/observable.py---say, adding a new blank line at the end of the file---and attempt to commit the change, your code will be formatted as the Continuous Integration in this repository requires. You'll need to commit the changes pre-commit makes, but they should just be code formatting.

Back to your question - it sounds like the library is not installed in your testing environment. This is a reasonable expectation from tests being in separate directories from the package source.

If you run make, a virtual environment will be built (separate from the one pre-commit is installed into - the background on why is in this PR if you're curious), and the local case_mapping package will be installed into it. Activating that virtual environment will let you run pytest:

make
source venv/bin/activate  # Begin use of virtual environment
pytest
deactivate  # End use of virtual environment

make check in this project happens to include running pytest, in this recipe. (The commands pytest and poetry run pytest happen to do the same thing for me.)

If you follow the above steps, you'll find there was a missed str() cast in your last update, but you shouldn't have any trouble with packages.

fabrizio-turchi commented 6 months ago

@ajnelson-nist I wouldn't like to bother you about the error I run into yesterday. I changed the example.py adding the FacetUrlHistory and UrlHistoryEntry Observables, but I had to change the line version = importlib.metadata.version("case-mapping") with version = "xxx". In this way I could run the example to test the changes and later I restored the line. The error remains the same inside the package: importlib.metadata.PackageNotFoundError: case-mapping

fabrizio-turchi commented 6 months ago

@ajnelson-nist I changed the code in the FacetUrlHistory class to fix the errors raised by case_validate. I didn't find a better way to fill in the uco-observable:urlHistoryEntry property. The code is borrowed by the _str_vars, _str_datetime ... methods of the FacetEntity class, but I couldn't use those methods. If you find a more elegant solution, do not hesitate to change teh code or suggest to me how to do it. 10x!

ajnelson-nist commented 6 months ago

@fabrizio-turchi : I've moved some files around so you can meet your original goal of getting a shareable, importable library. /mix_utils/utils.py has moved to case_mapping/mix_utils.py, because otherwise you would not be able to import utils.py as a module. The test you'd added under /mix_utils has moved to /tests, to demonstrate that once case_mapping is installed in a Python environment, mix_utils.py is accessible by import.

I've also added Python type reviews on all of the files that were under /mix_utils, because I caught one subtle JSON-LD issue (a necessary string casting) and I wasn't quite following some of the data-type flow without a signature being written.

If you look at the git log --stat messages, you'll see why I added those changes.

I'm approving this PR. If you agree with the changes, please feel free to merge, and then I'll start reviewing #38 .

Note: I did not review changes pertaining to other Issues. I only checked that the unit testing (case_validate, pytest, and mypy --strict on the mix-utils files) passes. I still think it is fine to merge this PR so 38 can be returned to, and if you think other Issues are resolved with the extra changes in this PR, please feel free to close them as addressed.