Radiance-Technologies / prism

Utilities for creating and curating a dataset for automated proof repair in the Coq Proof Assistant.
GNU Lesser General Public License v3.0
5 stars 2 forks source link

Define format and fields for project metadata #13

Closed a-gardner1 closed 2 years ago

a-gardner1 commented 2 years ago

Initial minimal list of prospective fields:

The commit SHA is optional to provide reusability of this format for local projects not yet associated with a repository. The ignore path regular expression is to prevent inclusion of external dependencies that are included as submodules or subdirectories (such as CompCert in VST, see #3). The clean and install commands are primarily for forward compatibility of this format with a live development environment. The Coq and SerAPI versions are included for forward compatibility with datasets that may contain commits across multiple versions of Coq.

Note that the above format is not necessarily representative of how the metadata will be stored in practice. Instead, each tuple of fields may be treated as a row in a database, and normalization may be employed to cut down on redundancy (for example, if multiple commits do not differ on any tuple other than their SHA, then their rows could potentially be collapsed to a single entry containing multiple SHAs).

rwhender commented 2 years ago

I'm not sure if this is covered elsewhere, but some potential additional fields would be

  1. a list of opam dependencies (if any) and
  2. a list of opam repositories that would need to be added to install those dependencies (if any).
a-gardner1 commented 2 years ago

Good idea. This might also be a good spot to list Coq dependencies as well.

juanerolon commented 2 years ago

Coq is written in OCaml; would this imply that there may be coq modules or packages written in ocaml, and also ocaml wrappers in other languages, e.g. python, etc. Would this mean list of repos that we are parsing for ocaml code? Also versioning info in the corresponding git logs?

a-gardner1 commented 2 years ago

Guys, here learning a bit about secure software development... from what I read Coq is written in OCaml; would this imply that there may be coq modules or packages written in ocaml, and also ocaml wrappers in other languages, e.g. python, etc. Would this mean list of repos that we are parsing for ocaml code? Also versioning info in the corresponding git logs?

The Coq compiler is written in Ocaml but Coq code itself (i.e., Gallina) is not. One can however extend its grammar with OCaml plugins (which is in fact how the language of tactics is defined. We consider repair of such plugins or other dependencies external to Coq proper to be out of scope for the foreseeable future. Tracking such dependencies if it helps in automating builds, however, is within scope for the metadata here.

a-gardner1 commented 2 years ago

An up-to-date metadata specification is provided and maintained in metadata/ProjectMetadataSpecifications.md