Start thinking about how to represent the form's data in a reusable way--not just as a PDF

nonprofittechy commented 3 years ago

We eventually want to be able to store the information captured in the interview in a structured way, that captures all of the same information that goes onto the PDF. We may want to do this two different ways: a version with all of the data, and an anonymized version with just a summary for running reports.

Some ideas: extend the ALDocument class so that it also assembles a "shadow" JSON file. We could both send this to the court and store in database in searchable JSONB format. This would let us capture the information at the same time we generate the final document for the user/court.

We may also want to format and structure this data to match the fields in ECFv4 or v5; align with the profileMe hackathon experiment; look to NCSC and Schema.org.

At this point: we just need to start thinking about the problem and try to design our classes to allow room for this expansion.

nonprofittechy commented 3 years ago

@BryceStevenWilley one of our ideas was that we should try sending this data to the court even before the court has a place for it to go. Is there an existing metadata field where arbitrary JSON or XML data could be attached?

plocket commented 3 years ago

I think one of the strengths of ALDocument is that it has one job and does it well - assembling the documents. It might be better to keep this data building functionality separate from ALDocument, even if ALDocument ends up utilizing the data that's created.

nonprofittechy commented 3 years ago

I think one of the strengths of ALDocument is that it has one job and does it well - assembling the documents. It might be better to keep this data building functionality separate from ALDocument, even if ALDocument ends up utilizing the data that's created.

Valid point. Worth keeping simplicity and clarity of purpose in mind as a design goal. I might need to clarify though--what we are thinking about here is the same data that goes into the PDF, but in a structured format rather than a picture that a person has to read. It should not contain different data. I think this could be part of ALDocument in the same way that the addendum code is part of ALDocument. These representations tend to be machine-readable documents, although keeping them indexable and searchable in a database row would be nice. XML is the old wave; JSON is the newer approach but not dominant. For some examples of people who have created structured representations of legal data:

Creating these files might simply be a template that we add an attribute for in the ALDocument class. To the extent it ends up adding a lot of new code and not something like a template, we could add a new class that the ALDocument incorporates as an attribute of that class.

Edit: the other thing to think about is that we will need to generate ECFv4 documents to send to Tyler jurisdictions and in our partnership with Louisiana. If that document is attached to the ALDocument, it will be a lot easier to send the binary PDF blob along with the structured data.

nonprofittechy commented 3 years ago

One thing that I think we really need to think through is the interface to abstract and capture the relevant fields, which will not have a 1-1 mapping to the information needed to fill in the PDF. Some information that goes into the document will be needed just for rendering/display purposes but has no meaning outside of the specific PDF. Most fields will have real semantic meaning. Some will be captured in the document, but not in the fields. For example, there could be hardcoded headings in the PDF that do not map to a field. When you read the PDF you understand what the data means, but just looking at the variables wouldn't tell you what option the user chose.

It is possible that combining a case type (could be implicit information that we want to make explicit--maybe user indicated it by starting a particular interview, but no variable in the interview state explains it) with all of the fields in the interview state would be enough information. If that feels promising, we could just come up with a deny list style approach, where we explicitly remove some irrelevant information from the answer store but default to sending over everything.

We may also need to add new data structures to capture information though. Things like relationship information that is just implicit in the variable names but the XML/JSON standard expects to be spelled out, reformatting the users and other_parties variables back to Plaintiff/Defendant if it's not relevant to the canonical representation, etc.

All that said: just sending over almost ALL of the field data feels like the place to start. And having uniform variable name standards, incorporating taxonomies like NSMIv2, etc. will all have a big impact on making this useful. This is a little bit proactive; the only information we know the court wants is the fields in ECFv4 which are very minimal.

SuffolkLITLab / docassemble-AssemblyLine

Start thinking about how to represent the form's data in a reusable way--not just as a PDF #53