errorcorrectionzoo / eczoo_data

Content of the Error Correction Zoo, stored in structured YAML format
https://errorcorrectionzoo.org/
Other
50 stars 89 forks source link

machine readable code information #269

Open Krastanov opened 2 years ago

Krastanov commented 2 years ago

It would be neat if besides human readable description, there is some more formal (maybe executable) description of the code, e.g. as a parity check matrix.

I am setting up some simulation tools for QEC with QuantumClifford.jl and instead of creating a whole new database of codes in that library, it would be neat if I can just query this database in order to construct a parity check matrix for a given code.

Would that be considered "in scope" for the eczoo project?

valbert4 commented 2 years ago

You mean stabilizer generator matrix for say a qubit stabilizer code? This sounds more like a code tables thing, which originally not in scope because of existing tables at http://www.codetables.de/. Those do display such matrices for some codes. Would it be easier to see how to enhance those tables into query-able databases?

Krastanov commented 2 years ago

Yes, I am referring only to the qubit codes with stabilizer tableau representations. The best of both worlds for me would be having the code table, but also having the human-readable meta data. The code tables above do not neatly list families of codes or their common names, rather they are just a big data-dump.

valbert4 commented 2 years ago

Thanks. How do you plan on writing up routines to generate such matrices? And what's the scope of codes you'd like to cover? This gets quite complicated; e.g, surface codes and other geometrically local friends would require a geometry as input. I need to get @phfaist's two cents here, but this sounds like it could be a separate repo that generates and stores these matrices, which can then be linked to or (ambitiously) presented in latex form (for small instances), in the zoo data files. This is not what I personally would want to spend time on, but I don't think that should stop you from doing that :).

Krastanov commented 2 years ago

For the parameterized cases, it would be some generating function. E.g.

code1 = ToricCode(length1, length2)
code2 = HyperGraphProductCode(graph_object)
tableau1 = Stabilizer(code1)
tableau2 = Stabilizer(code2)

For parameterized codes actual software would need to be written, but there are plenty of low-hanging fruit for things that are not parameterized, or for things where parameters are enumerated up to some small upper value in a database.

code3 = ECCZooDatabase("shor_code")
code4 = ECCZooDatabase("toric_10_10")

The foundation for much of this is available in https://github.com/Krastanov/QuantumClifford.jl

valbert4 commented 2 years ago

It could be a useful thing to build this up in a separate repo, either within the zoo org or outside. Let's see if @phfaist thinks there are better ideas, especially if we want to think long term to see how this would link with the eczoo_data repo.

phfaist commented 2 years ago

Thanks @Krastanov for your suggestions and your enthusiasm! As @valbert4 mentioned, we're trying to strike a balance between structuring code information in the most useful way, all while keeping the highest possible flexibility and extensibility for new code schemes. For instance, storing a "distance" parameter for a code would sound like a great idea, but it is not immediately clear what that field would mean for a GKP code, or for an approximate error-correcting code. We're trying to avoid the situation where we'd end up adding many fields with each field being applicable to a small subset of codes because the zoo wouldn't be as useful and those fields would be hard to maintain.

What is definitely possible is—as Victor suggested—to store information for specialized classes of codes separately; we can then add some calls to our zoo site generation scripts to fetch and include the relevant data on the relevant zoo code pages. If you'd like to manage a new repo that stores parity check matrices or other code table data in a structured, machine-readable form, we'd like to encourage you to do so! I could also imagine hosting this repository within the errorcorrectionzoo org account. We can coordinate regarding API's/how information is fetched from your repo. You can refer to codes in the zoo by their code_id, which are meant to be immutable.

We have a similar situation with volunteers for incorporating structured information about decoders, noise models, thresholds, and other benchmarks, and are planning to proceed in the same way. I'd imagine these two efforts are pretty orthogonal. But if there are efforts to standardize how some standard quantum codes (and related animals like decoders) are fully specified with machine-readable fields, it might be worth coordinating to make sure redundancies are avoided and potential pitfalls are detected earlier. @valbert4 what do you think?

valbert4 commented 2 years ago

Seems there really aren't many quantum code tables relative to classical (e.g., see refs in 2208.06832.pdf (arxiv.org)), so there is definitely room for such things. @artix41 @ehua7365 there might be useful overlap here. I can give Stefan his own repo, but it would be cool if there is cross-compatibility.

artix41 commented 2 years ago

We're currently working on a database + API that would fetch information about thresholds (we're making progress btw, we should have a proof-of-concept soon!), but we could in principle include more things in the API. Two comments that come to my mind:

  1. IBM told us that they are currently working on a sort of "official" code database, where each code would have a unique id and where numerical information about codes could be fetched using an API (Drew Vandeth is the contact point for that). So we might want to make sure that we're not doing the same thing.
  2. There are two things we can do regarding parity-check matrices: either storing the actual values for some given sizes on a database, or including generating functions for all sizes in the backend. The first option could work, but only for some given sizes (it might be fine in practice though). The second option would require to communicate with some existing frameworks that construct parity-check matrices (such as PanQEC, qecsim, or Qiskit QEC). The main problems I see with this are the long-term maintainability, as frameworks constantly evolve, and the fact that it makes contributions much harder for the users (while hard-coded parity-check matrices could simply be uploaded on the website). So I would recommend the first option (that's probably already what you had in mind), and that's definitely doable within the database format Eric and I are currently doing.

Since adding parity-check matrices is just another field in our database, we will try to add it in our proof-of-concept and see how it goes! Thanks for the suggestion!

ChrisPattison commented 1 year ago

There's also the question of whether one stores parity check matrices or any further additional structure. Like for example

Unfortunately, there are cases where one really needs such structure to do something with the code. I'm mostly raising this as a potential consideration. I have no suggestions on how to address it

ChrisPattison commented 1 year ago

Maybe a simple solution is to give links to known code generators, if they exist

valbert4 commented 2 months ago

We could consider merging tables and/or the stab gen matrices given in the tables with the zoo in some way.