WICG / webpackage

Web packaging format
Other
1.23k stars 116 forks source link

Form use cases #606

Open dgp1130 opened 3 years ago

dgp1130 commented 3 years ago

Forms as files

Today, there are countless companies around the world which send simple forms as files to collect information from customers. The use case essentially works like:

  1. Business creates a form document requesting some information.
  2. Business sends the form to a customer.
  3. Customer fills out their personal info.
  4. Customer sends the form back to the business.
  5. Business opens the form to read the information the customer has provided.

This general user flow has existed for a long time and has often been satisfied by PDF files. However, PDFs have a lot of their own problems, such as (but not limited to):

While web-based forms and products like Google Forms have provided alternatives to PDFs, these are generally server-based systems, requiring a software developer to implement the form or trust in a third-party to keep the form service available and secure. Historically, "email a form to a customer" is simply not possible with existing web technologies.

Using Web Bundles

With Web Bundles however, this is now possible. I can build a simple web form using existing web technologies and package it into a *.wbn file. I can email this file to a customer, and when they double-click it (and Web Bundles are fully supported by most web browsers), it can open in the user's browser and display my form for them to fill out.

However, there is one critical limitation, a user cannot "save" their information into the form and pass it back to me, because Web Bundles (as I understand them) are effectively immutable to users. I think this could be solved by a simple JavaScript API. Consider the following strawman API usage:

// HTML contains `<input type="text" id="first-name" />`
const firstNameInput = document.querySelector('input#first-name');

// When the user clicks a "Save" button...
document.querySelector('button#save').addEventListener('click', async () => {
  // Serialize form data to a `user-data.json` file stored in the `*.wbn` file.
  await webbundle.writeAppDataFile('./user-data.json', JSON.stringify({
    firstName: firstNameInput.value,
  }));
});

// When the `*.wbn` file is opened...
document.addEventListener('load', async () => {
  // Deserialize form data from the `user-data.json` file previously written to the `*.wbn` file.
  const data = await webbundle.readAppDataFile('./user-data.json');
  const json = JSON.parse(data);

  // Update the UI with the loaded data.
  firstNameInput.value = json.firstName;
});

This would allow a customer to fill out a *.wbn-based form with their information and save it back to the original file. The user could then send the changed file back to the business (via email for example), and when the business opens it, they will see all the original user's information. This is effectively equivalent to modern PDF-based workflows, and unlocks a whole suite of file-based possibilities that were previously impossible with web-based tooling.

There's a discussion to be had about whether LocalStorage/IndexedDB/etc. should also serialize to the *.wbn file. I'm using a distinct API here for demonstration/clarity purposes, see the Security section for more discussion about this.

In this example, webbundle.readAppDataFile() and webbundle.writeAppDataFile() will allow devs to read/write virtual files stored inside the *.wbn file. The files could have any name, directory structure, or format. In this case, we're writing a JSON file, but you could use any serialization mechanism. Binary versions of these functions would also be useful for reading/writing binary data such as photos, videos, or even protocol buffers.

Benefits of using *.wbn

Using web technologies can provide many improvements to the problems with existing document formats (PDFs):

  1. Ownership: Being a free and open web standard democratizes this feature. There is no conflicting implementation like Adobe Reader that might confuse implementations of the standard. Important updates like the ECMAScript version can be decoupled and provided implicitly by the browser.
  2. Software requirements: Only a browser is required to view and fill out a web form, a piece of software already used by the vast majority of users. Software to build web forms already exists and could easily be tweaked to provide Web Bundle functionality.
    • Imagine SquareSpace, Wix, Google Forms, or any other "build your own website/web form" tool with an "Export as *.wbn" button. These tools could cover all skill levels from a business user making a trivial form to a web developer building a complex Angular application.
  3. Scripting: Browsers already support and maintain a standardized and up-to-date ECMAScript implementation. Developers benefit from more consistent API interfaces and any additional features provided by browsers that may not be supported by PDFs out of the box.
  4. Accessibility: Browsers and the web already have an extensive a11y model and tooling, working well with screen readers and other a11y tools. Developers also have common and well-known patterns for building accessible sites which would also apply to *.wbn files.
  5. Security: Browsers provide a comprehensive sandbox for executing code safely and can provide much stronger security guarantees. The web also already has privacy patterns for using various APIs via appropriate permissions, allowing *.wbn files to safely take advantage of features like geolocation, file pickers, network requests, Bluetooth, WebUSB, etc. without compromising user privacy.

Example use cases

Modeling forms as a web application provides all the power that comes with such technology to the form itself, allowing these forms to do a lot more for users than a traditional PDF. Take a few examples:

  1. A "Fill out with LinkedIn" button which allows a user to perform a federated sign on to their LinkedIn account and then click a single button to auto-populate all their employment information directly into the form. This could even be a library provided by LinkedIn and integrated with web form builders to make it trivial to add for non-technical users creating a simple job application.
  2. A form input which autocompletes itself based on a query to some (possibly authenticated) backend service. Network requests can provide cookies as appropriate in order to call a protected service and allow a user to choose options from it.
    • For example: "Which of your reports would you like to promote?" can verify the name given is actually your report and fetch related info like employee ID and salary information to auto-fill other parts of the form.
  3. A "Sign with X" service that allows a user to authenticate with an identity provider (Google, Microsoft, Government systems, ...) and click a "Sign" button to digitally attribute all the information to the current user securely.
    • This one may have security/legal hurdles, but I personally hate the "print out a form, sign it, and then scan it back" workflow and would really love to find a way to do this digitally.
  4. A video embedded in the form (that is, hosted externally but embedded via a <video /> tag) which describes the form and how to fill it out.

Tooling

Additional tooling could be built to extract user data from the file and process it in an automated fashion, perhaps writing to a database, generating a spreadsheet, or performing analytics. Since user data is stored in a virtual file system within the *.wbn file, tools and libraries could be built to easily extract all or part of this file system. This can also allow businesses to scale up, starting with a simple .wbn form created by some drag and drop editor, then later creating a more comprehensive, hosted web application and migrating all the existing Web Bundle data into the new database. In fact, database systems could have an "Import from .wbn" feature similar to "Import from .csv", to easily extract user data and drop all the web resources.

It may also be beneficial to have a more structured API and file format. The webbundle.writeAppDataFile() API allows creation of individual files of any format, which is very flexible but would likely make it difficult to implement a useful "Import from *.wbn" feature without knowledge of the specific form being imported. A more structured API might provide something like:

// Write the following key-value pairs to a "metadata" file in the Web Bundle.
await webbundle.writeMetadata({
  'firstName': 'Douglas',
  'lastName': 'Parker',
  'age': 26,
  'email': 'dgp1130@noreply.users.github.com',
  // ...
});

// Read all key-value pairs from the metadata file in the Web Bundle.
const { firstName, lastName, age, email } = await webbundle.readMetadata();

This works as a simple key-value data store, and could write to a special location in the Web Bundle file. This could be extracted by automated tooling in a generic fashion, easily populating a database or a spreadsheet based on the keys. This more structured API could certainly grow over time to satisfy more use cases, such as storing photos or scanned pages of a document.

This model also maps to existing HTML form semantics. Most existing web forms are a bunch of <input type="..." /> elements inside a <form /> with implied serialization mechanisms. This API could be extended with a new attribute: <form action="webbundle" />. This would tell browsers to interpret <button type="submit" /> as a "Save" button, which serializes the form data into this Web Bundle metadata format based on the name attribute of the form elements. This is exactly like submitting a web form, except using Web Bundle metadata rather than an HTTP GET or POST request to a server. This would also support <noscript /> users and allow form generator tools to emit a complete form without a bunch of custom JavaScript embedded.

Security

There are a couple security caveats to consider, and likely more that I have not considered. I'm also not totally familiar with the expected security model of Web Bundles, but these are some initial thoughts I have.

While I'm not totally familiar with Signed HTTP Exchanges, if a *.wbn file is somehow signed, user data would need to be kept separate from the signed application and the file format would need to account for that. This way, a user can fill out a form with their own information without removing the signature on the actual application data.

While user data is stored in the *.wbn file much like the original web application resources, it should not be accessible via XHR, or else malicious users could attempt to inject or rewrite existing application resources to compromise others. Applications should explicitly call webbundle.readAppDataFile() to access and sanitize user data while trusting local application resources.

Existing client-side storage mechanisms like IndexedDB, LocalStorage, and cookies should be private to a single user and not serialized to the *.wbn file. This allows a browser to store private user information (such as authentication tokens) without leaking them into the document and whomever might view it next. It also protects common libraries from accidentally storing information into the *.wbn file without considering the use case.

The form use case is a possible malware vector. A business could send an innocuous form to a customer, who sends it back with some additional malware provided. This is already a problem with PDFs, however they tend to be more clear about when they are running scripts within a document, possibly having users explicitly enable scripts before executing them. With a *.wbn file, JavaScript is likely to be much more common than PDF scripts and more likely for users to enable it. While browsers already have <noscript /> support, it may be prudent to more prominently display options for users to configure script execution. Signed *.wbn files could prevent malicious users from modifying the contents, however many tools and non-technical users may create legitimate forms that are not signed and open to abuse by malicious users. Enabling/requiring tools that generate *.wbn files to sign their outputs easily (via signed exchanges or other mechanisms) and surfacing unsigned files just like non-HTTPS sites would be the best way to mitigate potential problems here.

Other Thoughts

I'm honestly not that familiar with PDFs from a technical perspective, so please fact check me on statements there. All I know is that every interaction I have with PDFs is painful and I find existing web technologies to be far more convenient and usable. The prevalence of web forms already shows this, however PDFs still have an effective monopoly on representing a form as a simple file. I believe that conceptual model and its ease of use are the main lacking feature of the web ecosystem. Web Bundles provide a direct answer to that problem, and with only a few simple tweaks, I think this use case could be served much better than existing tools on the market.

I'm not sure if the Web Bundles specification has an explicit "goal", but I'm guessing form use cases like this aren't really included there. This idea is likely a bit of a tangent from the original intent of Web Bundles, but I think the technology is 95% of the way there already and I hope this is something which is considered to be worth exploring. The only real proposal here is to add an API to write user data to a Web Bundle. The rest of this is justification and speculation about the impact and use cases for such a feature.

From a technical perspective, I think supporting form use cases with *.wbn is relatively straightforward as web specs go. As with any user-facing disruptive technology, the trickiest part with landing this idea is getting existing infrastructure and tools to support this file format to enable non-technical users to be as comfortable creating and filling out a Web Bundle form as they are today with a PDF form. Overcoming the 27-year head start of PDFs and achieving cultural parity is the most difficult challenge here.

jyasskin commented 3 years ago

I like the use case overall.

The existing bundles proposal has the building blocks to have the form.wbn be able to save and load .json (or whatever format) files that would live next to the form, rather than changing the form itself. I guess PDF forms do constitute an existence proof that people have a use for form data embedded into the form file itself, which we'd need this new API to allow.

I'm inclined to treat this as a V2 problem: once we've shipped bundles that can be downloaded as an offline form, then we can talk to groups that use fillable PDFs and see if they'd be interested in switching to a more web-enabled format.

ioggstream commented 3 years ago

@jyasskin in Italy we are interested in this perspective and we are already experimenting with projects like jsonforms. We have not already planned any PoC related to webpackage though. cc: @sebbalex @bfabio

EternityForest commented 3 years ago

I absolutely love this idea. In my opinion, Excel is one of the most important pieces of software of all time not because 2D grids are all that great, but because it let anyone make (theoretically) safe bundles of app and data.

This has use cases far beyond thee obvious PDF forms replacement, because it can also be used as a way to configure an offline-first app, and then distribute that configuration. The power isn't so much as a forms replacement, but as an "App in a box" for pretty much any purpose.

Importantly, self-updating offline catalogs become possible, along with P2P chat and other distributed applications where your "account" is just a file that you have full control over

But the downside of Excel is that you can't easily separate logic/forms and the actual data, so I very much like the idea of separate files that live "next to" the bundle.

I wonder if there might be some advantages to using a database, rather than a JSON file though. There seems to be a discomfort with ever including SQLite3 in a web standard, for fear of "Codifying it's quirks", but it is, ultimately, and insanely reliable, free, and near universally trusted system.

A JSON file must be completely rewritten, to save. We already see a problem with various files frequently rewritten being a cause of high disk activity.

A basic key/value store with JSON values would be far more flexible for making small updates.

Then again, this kind of thing can already be done via local storage APIs, so JSON files, or even just embedding everything right into the file, offers a very simple way to cover the obvious PDF forms use case.

Perhaps modifications could be written back to the bundle itself, for maximum drop-in effectiveness at being a PDF form, but with the UA having a notification that the site uses customized data, along with save/save as/clear custom data functionality, or even export/import custom data functions.

This would preserve the exact document model people are used to from any other site.

One problem with the whole thing though, is origins. If anything like cookies or local storage gets used, it will totally break, because two different filled out copies of the same form will share an origin, and probaby confuse everything.

That could discourage anyone from using it for anything more interesting, along with the fact that one could accidentally share a customized version when they meant to share an uncustomized version.

Perhaps form-supporting bundles should be treated completely differently. Whenever you hit save, it should jump to a new file with a new extension, so it is clear that this is no longer a clean copy.

To get really fancy, the browser could generate a new keypair just for that modified bundle (I think people would get uncomfortable with global keypairs by default) stored in a file, and use it to sign the new modified bundle.

When loading the file, the origin of that document would become .originaldomain.com

This gives every document it's own namespace for local APIs and the like. It puts a big scary blob of nonsense in the URL bar, preventing someone from distributing bad content in a modified form and convincing people that it came from the original source.

It also gives documents a persistent identity that can be preserved across new versions of the same document.

It provides some minimal level of authentication, someone who believes they have received a bogus update to a document can prove it, even though people probably won't be routinely checking the URLs.

There's no reason to limit this to just basic forms, when all kinds of interactive, offline first apps can be done this way.

The old TiddlyWiki is a perfect example. It used to use self-modifying HTML for exactly this kind of thing.

dgp1130 commented 3 years ago

This has use cases far beyond thee obvious PDF forms replacement, because it can also be used as a way to configure an offline-first app, and then distribute that configuration. The power isn't so much as a forms replacement, but as an "App in a box" for pretty much any purpose.

I really like this "App in a box" mentality. In my original thought process, I was trying to come up with an existing alternative technology that provides the same features as PDFs. I came to realize that PDFs aren't just strictly forms, but rather more complex applications that happen to target form use cases. In many cases, a user didn't want a simple form, they wanted an application that passes structured information from one person to another. Hence, the alternative application platform I identified was the web, and just filled in the blanks from there.

As you mentioned though, this isn't unique to forms and really applies to any file-based app. You could take any app and put it in a WebBundle box to gain many of the advantages mentioned previously. This can be a much broader and more generic idea than just forms.