Closed fititnt closed 2 years ago
Caralho, deu certo já no primeiro commit.
https://travis-ci.com/github/EticaAI/HXL-Data-Science-file-formats/
# .travis.yml
# @from https://github.com/tox-dev/tox-travis
language: python
python:
- "3.7"
- "3.8"
- "3.9"
install: pip install tox-travis
script: tox
While hxlm.core, in particular the Htypes, makes sense to test the functions directly, at least for hdpcli in the short term, since internals are changing, it seems reasonable to test at a higher level. (from this comment) https://github.com/EticaAI/HXL-Data-Science-file-formats/issues/16#issuecomment-802424548
Ok, just discovered this doctest thing https://docs.python.org/3/library/doctest.html.
So this python3 -m doctest -v hxlm/core/schema/vocab.py
could be used to test what is documented like this:
class HVocabHelper:
# (.......)
def get_value(self, dotted_key: str, default: Any = None) -> Any:
"""Get value by dotted notation key
Examples:
>>> from hxlm.core.schema.vocab import HVocabHelper
>>> HVocabHelper().get_value('datum.POR.i')
>>> HVocabHelper().get_value('attr.datum.POR.id')
'dados'
Args:
dotted_key (str): Dotted key notation
default ([Any], optional): Value if not found. Defaults to None.
Returns:
[Any]: Return the result. Defaults to default
"""
keys = dotted_key.split('.')
return functools.reduce(
lambda d, key: d.get(
key) if d else default, keys, self._vocab_values
)
Output
fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ python3 -m doctest -v hxlm/core/schema/vocab.py
Trying:
from hxlm.core.schema.vocab import HVocabHelper
Expecting nothing
ok
Trying:
HVocabHelper().get_translation_value('attr.datum.POR.id')
Expecting:
'dados'
ok
Trying:
HVocabHelper().get_translation_value('datum.POR.id')
Expecting:
'dados'
ok
Trying:
from hxlm.core.schema.vocab import HVocabHelper
Expecting nothing
ok
Trying:
HVocabHelper().get_value('datum.POR.i')
Expecting nothing
ok
Trying:
HVocabHelper().get_value('attr.datum.POR.id')
Expecting:
'dados'
ok
15 items had no tests:
vocab
vocab.ConversorHSchema
vocab.ConversorHSchema.__init__
vocab.HVocabHelper
vocab.HVocabHelper.__init__
vocab.ItemHVocab
vocab.ItemHVocab.__eq__
vocab.ItemHVocab.__init__
vocab.ItemHVocab.__repr__
vocab.ItemHVocab.diff
vocab.ItemHVocab.merge
vocab.ItemHVocab.parse_yaml
vocab.ItemHVocab.to_dict
vocab.ItemHVocab.to_json
vocab.ItemHVocab.to_yaml
2 items passed all tests:
3 tests in vocab.HVocabHelper.get_translation_value
3 tests in vocab.HVocabHelper.get_value
6 tests in 17 items.
6 passed and 0 failed.
Test passed.
Note: While I'm not 100% sure if these docs can be added to tox (see https://stackoverflow.com/questions/49254777/how-to-let-pytest-discover-and-run-doctests-in-installed-modules; did not tested) sees that at least is possible to run manually with with these python3 -m doctest -v hxlm/core/schema/vocab.py
.
WOW, it worked on first try! I'm getting good at this. all that past work related with Ansible test-infra give a hint!
pytest -vv hxlm/ --doctest-modules
Context:
- pytest, by default, require files on the repository
test/
folder- pytest --doctest-modules` is different: it test the docstrings on all the files
- testinfra allow test entire infrastrucure (like check if services are running, run shell commands. Actually testinfra allow test REMOTE infrastructure (via SSH) using local files)
Seems that in theory (see this post/comment https://github.com/pytest-dev/pytest/issues/2042#issuecomment-381309723) pytest does not allow run explicitly both pytest with tests/
folder and the python doctest, But using testinfra we simply simulate running the entire pytest -vv hxlm/ --doctest-modules
.
So, it worked!
I will leave here how it's done, since I know it can be used by others on other projects (or at least help a lot my future self)
tests/test_zzz_doctest.py
# (...)
def test_pytest_doctest_modules_all_may_have_false_positives(host):
"""Run pytest -vv hxlm/ --doctest-modules
WARNING: the test_zzz_doctest.py MAY return false positives (e.g. test
testdoc code even outside the hxlm module). Consider temporary disable this
test file and run
pytest -vv hxlm/ --doctest-modules
Manually.
"""
cmd = host.run("pytest -vv hxlm/ --doctest-modules")
# cmd = host.run("pytest --doctest-modules")
print('cmd.stdout')
print(cmd.stdout)
print('cmd.stderr')
print(cmd.stderr)
assert cmd.succeeded
# (...)
hxml/core/util.py
Just an example with doctest
# (...)
@lru_cache(maxsize=128)
def load_file(file_path: str, delimiter: str = ',') -> Union[dict, list]:
"""Generic simple file loader (YAML, JSON, CSV) with cache.
Args:
file_path (str): Path or bytes for the file
delimiter (str): Delimiter. Only applicable if is an CSV/TSV like item
Returns:
Union[dict, list]: The loaded file result
>>> import hxlm.core as HXLm
>>> file_path = HXLm.HDATUM_UDHR + '/udhr.lat.hdp.yml'
>>> hsilo_example = load_file(file_path)
>>> hsilo_example[0]['hsilo']['tag']
['udhr']
"""
with open(file_path, 'r') as stream:
if file_path.endswith('.json'):
return json.load(stream)
if file_path.endswith('.yml'):
return yaml.safe_load(stream)
if file_path.endswith('.csv'):
reader = csv.reader(stream, delimiter=delimiter)
result = []
for row in reader:
result.append(row)
return result
raise SystemError('Unknow input [' + str(file_path) + ']')
# (...)
To allow tests with JavaScript, I thin we could use GitHub pages. But markdown from hdp-conventions/README.md
make Jekyll sad.
Since HXL-Data-Science-file-formats is an really huge name, I guess we could use, mostly for sake of testing, an subdomain from @EticaAI.
Ok. The #18 HDPLisp prototype on Racket Platform is starting to get complicated (actually it's my first Racket package, so it's complicated because there is a a lot of mental context switch).
With exception of JavaScript draft, everything else after Tox was implemented is automated tested. So I think that since the Racket prototype version is likely to be the reference, worth some time to make automated testing from start.
This is also likely to save time upfront, both got new people and myself if doing quick updates on several host platforms (python, JavaScript+NodeJD/Browser), Racket.
Wonderful. Worked on second try and without errors 😍.
Ok. Now some refactoring. I think we could move hxlm/data/ontologia/
to ontologia/
and then make a symbolic link. The ongologia is becoming the most important part of something that could resemble an hdp-toolchain
It already was done some time ago. Some fixes may still relevant, but closing for now.
From the tools on EticaAI/HXL-Data-Science-file-formats, the drafted (not yet even as proof as concept) library temporally called
hxlm.core
(see hxlm #11) is already increasing essential complexity. Even if this library become used mostly used by a few people on @EticaAI/@HXL-CPLP, I believe that the bare minimum would be to add tests so new features don't break past implementations or, if they have to break, at least we know when and what.This dedicated issue is mostly to have public references if others need to set up similar features. Also continuous integration on it's own is different from code.
Context
The current hxlm.core is written in Python. While the concept was born from an single all-in-one file, HXLMeta (see hxlquickmeta` (cli tool) + HXLMeta (Usable Class) #9), we're drafting a concept (that may be too hard to be feasible) of Declarative Programming (see comment https://github.com/EticaAI/HXL-Data-Science-file-formats/issues/11#issuecomment-788651928 ) and use YAML syntax to, at least:
In a context of the original idea of HXL-Data-Science-file-formats the idea nave minimum viable products to enforce what is "right" and "what is wrong" and have tooling systems that deal with the technical parts could allow exchange sensitive data with fast paced while still respecting laws. Also does exist a need that even who (either semi-automated with by human or totally automated HRouting) could not see themselves sensitive data but still be able to parse metadata.
Please note that even if the point 2 (hcompliance) do have MVPs and plan from start to allow even automated auditing, the idea is make easier work of who already share sensitive data and need to make decisions quickly in the name of someone else while (if necessary) have logs of what was done.
Yes, the idea of automated tests in such context is not overkill compared to the full thing.
Also, all the things here are dedicated to public domain. Including the use of testing.
Evaluating continuous integration tools
At this moment I'm not sure if I should use the GitHub actions (that seems the new thing) or something more traditional like Travis-CI (the open source version).
I know that Travis has very good open source CPU time limits allowed. Not sure about GitHub. I know I could do something like setup an Jenkins, but as I'm also writing the python code (and also that I have no money to let yet more an server to be running for years; and this may need to know past issues) I think Jenkins is not an option now.