XLSForm / pyxform

A Python package to create XForms for ODK Collect.
BSD 2-Clause "Simplified" License
80 stars 136 forks source link

Move markdown to pyxform functionality out of tests and into main package #599

Closed ukanga closed 1 month ago

ukanga commented 2 years ago

⚠️ See this comment for how we've decided to proceed.

Software and hardware versions

pyxform v1.7.0 - v1.9.0, Python 3.8.13

Problem description

I have been using the functionality within pyxform_test_case in writing tests using markdown to minimise or eliminate the use of Excel files within the codebase. I believe the change introduced by the PR https://github.com/XLSForm/pyxform/pull/562 eliminated the inclusion of the package during distribution and as such the module is not available in third party code bases that rely on it.

Steps to reproduce the problem

Python 3.8.13 (main, Mar 24 2022, 04:09:24) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> from pyxform.tests.pyxform_test_case import PyxformMarkdown
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pyxform.tests'

Expected behaviour

Could we make this module available under pyxform.test_utls or any relevant path and allow the continued use by third party code bases for writing tests? The goal is the same as was for this package to reduce the use of binary XLSForm files.

ukanga commented 2 years ago

@lindsay-stevens @lognaturel Thoughts?

lognaturel commented 2 years ago

minimise or eliminate the use of Excel files within the codebase.

The codebase you're referencing here is your external system (presumably Ona)? Are you writing additional tests that verify the conversion functionality? If so, could those be added here? Or are you using pyxform_test_case to test Ona functionality somehow? Maybe you're just using PyxformMarkdown, I guess that's what your sample import suggests? That's definitely unexpected usage to me so we should figure out just what the public API should be and make it clearly a public API.

lognaturel commented 2 years ago

Ah, actually, I think I just remembered that Ona uses the pyxform json representations internally instead of the actual XForm, is that right? So you use that pyxform json representation to test Ona features that interact with the form def in some way?

ukanga commented 2 years ago

We use both the JSON as well as the XForm representation. Yes, in my case this is onadata, for example we use the method self.md_to_pyxform_survey(md_xlsform, kwargs=kwargs) then access the XML and JSON representation accordingly which is the goal.

survey = self.md_to_pyxform_survey(md_xlsform, kwargs=kwargs)
...
xml = survey.to_xml()
json = survey.to_json()
...
ukanga commented 2 years ago

I have not done much isolation other than treating the two files as all public https://github.com/XLSForm/pyxform/pull/600/files.

lognaturel commented 2 years ago

Thanks for clarifying the usage! That all makes sense.

My preference would be to make PyxformMarkdown into its own module (e.g. md2pyxform) if it needs to be part of the public interface. It doesn't feel inherently test-related. As far as I can tell there's no reason for it to be a class and that would avoid multiple inheritance. Would all that make sense to you, @ukanga?

@lindsay-stevens I'd also like your thoughts on what a good structure would be.

lindsay-stevens commented 2 years ago

In my view tests are private code, and as a result any changes to the tests are not (usually) mentioned in the changelog and these changes don't affect versioning. So I don't think it's good practice for 3rd party code to rely on tests. I wonder if those 3rd party tests using the pyxform test code are perhaps tests that should be contributed as pyxform tests?

Another issue for onadata is that it seems to support both python 3 and 2, whereas pyxform supports only 3.7+ only.

Anyway, some suggestions:

  1. No change: the PyxformMarkdown class and dependency function md_table_to_ss_structure are a total ~100 lines - 3rd party libs could copy this to their codebase with the pyxform license copyright notice.
  2. No change: 3rd party libs requiring pyxform test code can clone the repo, or install from GitHub instead of PyPi e.g. pyxform @ git+https://github.com/XLSForm/pyxform@v1.9.0#egg=pyxform_tests#subdirectory=tests.
  3. Packaging change: pyxform publishes a separate tests package on PyPi such that 3rd party libs can have deps on e.g. "pyxform" and "pyxform_tests" (more or less same outcome as above option 2).
  4. Code change: we polish up the test markdown parser functions (e.g. no class, support escaping pipes per GFM tables spec, upgrade parser to use scanner like in #578) and put them in the main pyxform package under md2pyxform (or similar name).

Option 4 could have other benefits beyond meeting the request in this ticket:

  1. If markdown is sufficiently incorporated into the pyxform codebase, users could use it as a plain-text (git trackable) input option as an alternative to XLS/XLSX. This may obsolete the yxf project which for the same reason is doing this sort of thing with YAML.
  2. The markdown code could be used in a discourse bot that converts or validates markdown forms in the ODK forum into XForms 🤖
ukanga commented 2 years ago
  1. No change: the PyxformMarkdown class and dependency function md_table_to_ss_structure are a total ~100 lines - 3rd party libs could copy this to their codebase with the pyxform license copyright notice.

Seems like a path of least effort, will consider this in the short term.

  1. No change: 3rd party libs requiring pyxform test code can clone the repo, or install from GitHub instead of PyPi e.g. pyxform @ git+https://github.com/XLSForm/pyxform@v1.9.0#egg=pyxform_tests#subdirectory=tests.

I did not have much success with this option.

  1. Code change: we polish up the test markdown parser functions (e.g. no class, support escaping pipes per GFM tables spec, upgrade parser to use scanner like in 495: Default string values with dashes are mistakenly treated as dynamic defaults #578) and put them in the main pyxform package under md2pyxform (or similar name).

I'm keen on Option 4, having it as part of the library will make things far much easier.

I will take down the PR #600. and leave the issue open hoping we pursue option 4 instead.

lognaturel commented 2 years ago

I like the way y'all think!

a discourse bot that converts or validates markdown forms in the ODK forum into XForms 🤖

Wild.