PGFinder 2.0 upgrade - Githubissues

Mesnage-Org / pgfinder

Peptidoglycan MS1 Analysis Tool

https://mesnage-org.github.io/pgfinder

GNU Lesser General Public License v3.0

4 stars 2 forks source link

PGFinder 2.0 upgrade #259

Open smesnage opened 3 months ago

smesnage commented 3 months ago

PGFinder 2.0.pptx

ns-rse commented 3 months ago

Hi @smesnage,

Do you have copies of the CSV files you would like users to be able to download please?

smesnage commented 3 months ago

Nope but I can make these. I will! You're referring to the databases for model organisms, right?

ns-rse commented 3 months ago

Morning,

There looks to be two aspects for the Database builder....

Building Block components

This is the top box on the left hand side and on the last slide there are screen shots of a couple of CSVs with the title .csv files to download for users to amend.

If you're able to construct these I can start looking at getting them added.

Muropeptide list

This is the bottom box on the left hand side and lists...

E. coli (monomers; mDAP; D-Glu)
B. subtilis (monomers; mDAP(NH2), D-Glu)
C. difficile (monomers; GlcN, mDAP, D-Glu)
S. aureus (monomers; L-Lys, D-Gln, Gly5 lateral chain)
E. faecalis (monomers; L-Lys, D-Gln, Ala2 lateral chain)
E. faecium (monomers; L-Lys, D-Gln, Asp/Asn lateral chain)

Its possible they might be some of the files in lib/pgfinder/masses but if not or there are now more available if you could pass them on that would be great.

Cheers,

@ns-rse

ns-rse commented 2 months ago

Hi @smesnage

I've had a look through the files under various data directories in the repository and can only find masses for...

lib/pgfinder/masses/c_diff_monomers_complex.csv
lib/pgfinder/masses/c_diff_monomers_non_redundant.csv
lib/pgfinder/masses/c_diff_monomers_simple.csv
lib/pgfinder/masses/e_coli_monomers_complex.csv
lib/pgfinder/masses/e_coli_monomers_non_redundant.csv
lib/pgfinder/masses/e_coli_monomers_simple.csv

But these look slightly different to the examples in the last slide as they are the masses of the items in the orange box.

I was wondering though can I take the Structure column from these files and construct at least the above orange box. Looking at the top 20 lines of the E coli files they look like the following.

❱ bat -r 0:20  lib/pgfinder/masses/e_coli_monomers_* 
───────┬──────────────────────────────────────────────────────────────
       │ File: lib/pgfinder/masses/e_coli_monomers_complex.csv
───────┼──────────────────────────────────────────────────────────────
   1   │ Structure,Monoisotopicmass
   2   │ gm|0,498.206090
   3   │ gm-gm|0,976.385965
   4   │ gm-gm-gm|0,1454.565840
   5   │ gm-A|1,569.243204
   6   │ gm-AE|1,698.285797
   7   │ gm-AEJ|1,870.370589
   8   │ gm-AEJA|1,941.407703
   9   │ gm-AEJC|1,973.379774
  10   │ gm-AEJD|1,985.397532
  11   │ gm-AEJE|1,999.413182
  12   │ gm-AEJF|1,1017.439003
  13   │ gm-AEJG|1,927.392053
  14   │ gm-AEJH|1,1007.429501
  15   │ gm-AEJI|1,983.454653
  16   │ gm-AEJK|1,998.465552
  17   │ gm-AEJL|1,983.454653
  18   │ gm-AEJM|1,1001.411074
  19   │ gm-AEJN|1,984.413516
  20   │ gm-AEJP|1,967.423353
───────┴──────────────────────────────────────────────────────────────
───────┬──────────────────────────────────────────────────────────────
       │ File: lib/pgfinder/masses/e_coli_monomers_non_redundant.csv
───────┼──────────────────────────────────────────────────────────────
   1   │ Structure,Monoisotopicmass
   2   │ gm|0,498.206090
   3   │ gm-gm|0,976.385965
   4   │ gm-gm-gm|0,1454.565840
   5   │ gm-A|1,569.243204
   6   │ gm-AE|1,698.285797
   7   │ gm-AEJ|1,870.370589
   8   │ gm-AEJA|1,941.407703
   9   │ gm-AEJC|1,973.379774
  10   │ gm-AEJD|1,985.397532
  11   │ gm-AEJE|1,999.413182
  12   │ gm-AEJF|1,1017.439003
  13   │ gm-AEJG|1,927.392053
  14   │ gm-AEJH|1,1007.429501
  15   │ gm-AEJI|1,983.454653
  16   │ gm-AEJK|1,998.465552
  17   │ gm-AEJM|1,1001.411074
  18   │ gm-AEJN|1,984.413516
  19   │ gm-AEJP|1,967.423353
  20   │ gm-AEJQ|1,998.429167
───────┴──────────────────────────────────────────────────────────────
───────┬──────────────────────────────────────────────────────────────
       │ File: lib/pgfinder/masses/e_coli_monomers_simple.csv
───────┼──────────────────────────────────────────────────────────────
   1   │ Structure,Monoisotopicmass
   2   │ gm|0,498.206090
   3   │ gm-gm|0,976.385965
   4   │ gm-gm-gm|0,1454.565840
   5   │ gm-A|1,569.243204
   6   │ gm-AE|1,698.285797
   7   │ gm-AEJ|1,870.370589
   8   │ gm-AEJA|1,941.407703
   9   │ gm-AEJC|1,973.379774
  10   │ gm-AEJD|1,985.397532
  11   │ gm-AEJE|1,999.413182
  12   │ gm-AEJF|1,1017.439003
  13   │ gm-AEJG|1,927.392053
  14   │ gm-AEJH|1,1007.429501
  15   │ gm-AEJI|1,983.454653
  16   │ gm-AEJK|1,998.465552
  17   │ gm-AEJM|1,1001.411074
  18   │ gm-AEJN|1,984.413516
  19   │ gm-AEJP|1,967.423353
  20   │ gm-AEJQ|1,998.429167
───────┴──────────────────────────────────────────────────────────────

If I'm understanding correctly the work here is to build these from their components which are defined in the green CSV...

And the items in the orange CSV get constructed automatically from the weights of the components in the green?

Aside from the functionality to providing default files for people to customise we'll need to add the functionality to construct the masses of those in the orange from the constituents in the green I think.

If you could let me know if I've understood this correctly that would be great and if the sample structures for E coli and C difficile are incorrect I'd need those as well as the sample masses for green.

Cheers,

@ns-rse

smesnage commented 2 months ago

You're correct. The "green table" corresponds to the description of the "alphabet" used to define peptidoglycan fragments (in the "orange table"). And yes, the example (orange list) in the last slide is not entirely correct. I forgot to add the suffix (|0, |1, |2) for some structure to describe their oligomerisation state; the list does not correspond to E. coli because I wanted to give an example for the user to modify that contains more diverse structures (glycan chains, modified fragments). So it needs to be fixed but I'd rather give this example than the E. coli one!

I hope it makes sense.

ns-rse commented 2 months ago

Thanks @smesnage that makes sense.

If you could mail me the CSV from which these were culled I'll be able to use those as a basis to get things setup. It doesn't matter too much at the moment if labels aren't exactly correct we can refine that once the workflow is in place.

Cheers,

@ns-rse

smesnage commented 2 months ago

Example list of muropeptides to fragment.xlsx

Reference masses for mass calculator EXAMPLE.xlsx

smesnage commented 2 months ago

.csv versions below

Reference masses for mass calculator EXAMPLE.csv

Example list of muropeptides to fragment.csv

ns-rse commented 2 months ago

Thanks @smesnage :+1:

ns-rse commented 2 months ago

Hi @smesnage ,

Been working on this and have a decent idea of how to proceed but some questions about the example you shared.

Muropeptides

These are the structures...

Structure
gm |0
gm-gm |0
gm(Anh) |1
gm(-Ac) |1
gm-AEJ |1
gm-AEJA |1
gm-AEJG |1
gm-AEJAG |1
gm-AEJKR |1
gm-AEJ (Anh) |1
gm-AEJA (Anh) |1
gm-AEJA (-Ac) |1
gm-gm-AEJA |1
gm-AEJA=gm-AEJA (4-3) |2
gm-AEJA=gm-AEJ (3-3) |2
gm-AEJ=gm-AEJ (3-3) |2
gm-AEJKR=gm-AEJA (4-3) |2
gm-AEJA=gm-AEJA (Anh) (4-3) |2
gm-AEJAG=gm-AEJA (4-3) |2
gm-AEJKR=gm-AEJ (Anh) (3-3) |2
gm-gm-AEJA=gm-AEJA (4-3) |2
gm(-Ac)-AEJA=gm-AEJA (4-3) |2
gm(+Ac)-AEJA=gm-AEJ (Anh) (3-3) |2

Reference Masses

These are the reference masses (with empty column and description stripped as they're not needed for the task I'm trying to achieve).

g,203.07937
m,277.1162
g(+Ac),245.08994
m(+Ac),319.12672
g(-Ac),161.06881
m(-Ac),235.10559
Lac,72.02113
A,71.03714
C,103.00919
D,115.02694
E,129.04259
F,147.06841
G,57.02146
H,137.05891
I,113.08406
J,172.08479
K,128.09496
M,131.04049
N,114.04293
P,97.05276
Q,128.05858
R,156.10111
S,87.03203
T,101.04768
V,99.06841
W,186.07931
Y,163.06333
H2O,18.0106
B,118.07423
O,132.08987
X,163.04807
U,
Z,

Approach

Take a structure.
Break it down into components.
Look up the mass of the components.
Sum the masses.

Simple Example 1

gm
g and m
g : 203.07937 and m : 277.1162
gm : 480.19557

I can calculate this ok.

Simple Example 2

gm-AEJ |1
g, m, A, E, J
g: 203.07937, m: 277.1162, A: 71.03714, E: 129.04259, J: 172.08479
gm-AEJ |1: 852.36009,

I can calculate this ok.

Things get complicated!

gm-AEJA=gm-AEJA (Anh) (4-3) |2
g, m, A, E, J
2 x g: 203.07937, 2 x m: 277.1162, 4 x A: 71.03714, 2 x E: 129.04259, 2 x J: 172.08479
gm-AEJA=gm-AEJA (Anh) (4-3) |2 : 1846.79446

I've not sussed this one out yet because decomposing the muropeptides into their components requires splitting the string. This can be done very simply by splitting every character but that is overkill and the reference masses have things like g(+Ac) / g(-Ac) / m(+Ac) / m(-Ac) which we would need to capture anyway.

This means we need to leverage Regular Expressions to split the strings of the muropeptides into their building blocks.

This looks for patterns on which to split, some obvious ones are - and = but it raises the question of whether components such as (Anh), (-Ac) (4-3) and |1/|2/|3 have any significance?

If not that simplifies things and we can focus on splitting everything upto the first space in the muropeptides encoding and ignore everything after, although there is one instance where (-Ac) occurs at the end (gm-AEJA (-Ac) |1) but I think if that can be ignored its not too bad as its then just capturing m(-Ac) and related.

I noticed also there is gm(Anh) but not m(Anh) in the reference masses.

smesnage commented 2 months ago

That's a bit more complicated than it seems. Explanations are below to calculate these structures. But the worrying thing is that you should not bother about this because Brooks has already written the piece of code that does this? Let me get back to Brooks to make sure things are not done twice but I thought he gave you access to his masscalc code. Unless this is just a sanity check for you to figure out if you understand how things are done?

Simple Example 2 gm-AEJ |1 g, m, A, E, J g: 203.07937, m: 277.1162, A: 71.03714, E: 129.04259, J: 172.08479 gm-AEJ |1: 852.36009 + a molecule of water (each residue is the mass of an amino acid engaged in a bond, the last one need an extra mass)

gm-AEJA=gm-AEJA (Anh) (4-3) |2 g, m, A, E, J 2 x g: 203.07937, 2 x m: 277.1162, 4 x A: 71.03714, 2 x E: 129.04259, 2 x J: 172.08479 gm-AEJA=gm-AEJA (Anh) (4-3) |2 : 1846.79446 Need to add 1 molecule of water (18 Da-ish) for a "free residue" only (the other one on the other peptide stem is in a bond). Then the Anh modification implies you need to take away a molecule of water + 2 hydrogens (20 Da-ish)

ns-rse commented 2 months ago

Thanks for the quick response @smesnage

So the |1 / |2 / |3 / (Anh) / (4-3) are significant. It looks like this is a grammar/dictionary, are the rules written down anywhere? Is it the notation referenced here? That seems to lack the detail about how to handle things like (Anh) and other components.

@TheLostLambda has indeed given me access to his work and I copied it to the repository Mesnage-Org/smithereens.

Its written in a language I don't know (Rust) so I've not been able to follow what its doing and get my head round it properly. I could see that its using atomic masses of elements accounting for different isotopes and that there is a muropetide module in there too.

@TheLostLambda do you have any documentation on how to use smithereens please? I know the basics and have cloned and cargo build to build it, tried running the tests (seems there aren't any unless I'm missing something) and installed it but have no idea how to use it, let alone start interfacing it with Python/Svelte...

❱ smithereens       
Molecule: gm
  × expected a chemical formula (optionally followed by a '+' or '-' and a particle offset), or a standalone particle offset
  ├─▶   × expected a particle (like p or e), optionally preceded by a number
  │   
  ╰─▶   × the particle "g" could not be found in the supplied atomic database
        help: double-check for typos, or add a new entry to the atomic database

   ╭────
 1 │ gm 
   · ┬
   · ╰── particle not found
   ╰────

Molecule: gm-gm
  × expected a chemical formula (optionally followed by a '+' or '-' and a particle offset), or a standalone particle offset
  ├─▶   × expected a particle (like p or e), optionally preceded by a number
  │   
  ╰─▶   × the particle "g" could not be found in the supplied atomic database
        help: double-check for typos, or add a new entry to the atomic database

   ╭────
 1 │ gm-gm 
   · ┬
   · ╰── particle not found
   ╰────

Molecule: gm-AEJ
  × expected a chemical formula (optionally followed by a '+' or '-' and a particle offset), or a standalone particle offset
  ├─▶   × expected a particle (like p or e), optionally preceded by a number
  │   
  ╰─▶   × the particle "g" could not be found in the supplied atomic database
        help: double-check for typos, or add a new entry to the atomic database

   ╭────
 1 │ gm-AEJ 
   · ┬
   · ╰── particle not found
   ╰────

Molecule: H2O
Monoisotopic Mass: 18.010565
Average Mass: 18.0153
Charge: 0

Molecule: CH4
Monoisotopic Mass: 16.031300
Average Mass: 16.0425
Charge: 0

Molecule: C6H12O6
Monoisotopic Mass: 180.063388
Average Mass: 180.1561
Charge: 0

Molecule:

I can get some basic output from it but how to access the muropeptide builder side of things? As well as no tests documentation appears thin on the ground.

As an aside...

As neat and fast as Rust is using multiple languages adds another layer of complexity to the development and long term maintenance of software (particularly as people with limited programming experience may be involved in the future). I'm not adverse to this where its required but I'm not sure the advantages of Rust (memory safe and fast due to being compiled) are of any benefit here. I'm happy to be convinced otherwise though.

TheLostLambda commented 2 months ago

Hi @ns-rse ! Sorry for the delayed response!

First bit that might be confusing: the main branch of smithereens is a sort of third-pass at this problem (since there is so much complexity in structure parsing, some in mass calculation, and a very large amount in fragmentation for MS2). I have one (incomplete) version in Python that hit a dead end, and another (also incomplete) version in Rust which lives (only) in the wasm-pilot branch. The main branch does not currently contain any code for PG structural parsing, mass calculation, or fragmentation.

With that being said, the most up-to-date grammar exists here: https://github.com/TheLostLambda/smithereens/tree/main/grammar — the HTML is very nice to look at. I have more information about the semantics of these structures elsewhere, but that's something that I still need to implement in code.

For the purposes of getting this PGFinder 2.0 out and to unblock a paper, I think all you'll need is to copy this pre-compiled pkg/ directory and use it via the web-app: https://github.com/TheLostLambda/smithereens/tree/wasm-pilot/pkg

Once this third-iteration of code is done, it should be drop-in replacement for the pkg/ directory currently on wasm-pilot and will add support for more advanced structures than that old version currently supports.

On Rust, I'll admit this is me selfishly trying to reduce the amount of duplicate work done in Python. For PGFinder (the Python application + web UI), it is annoying to introduce this third language into the mix, but the reasons I think it's currently the best path are:

Whilst mass-calculation isn't too intensive, fragment calculation is much more so and I hit a practical wall trying to get past GIL issues in Python and even the single-threaded, naive re-implementation in Rust was several orders of magnitude faster than the Python version.
Since my current work on this PG Language + Mass Calc + Fragmentation code belongs to a larger automated MS2 analysis tool, it's beginning to become properly complex software that I think (admittedly in my very personal opinion) Python makes more difficult to write correctly (with significantly fewer static guarantees and a comparative lack of clear best-practices).
Rust has this relatively new MS search engine tool: https://github.com/lazear/sage which I've been coordinating with the developer to integrate into my pipeline — keeping everything in Rust.
All of those are reasons I've chosen to embark on this greater project using Rust, but the reason I think it's acceptable to temporarily add some complexity to PGFinder is because I see this tool / pipeline superseding PGFinder. SAGE will hand us, out of the box, much of what PGFinder can do and more, requiring only the reimplementation of the modification / cross-linking / consolidation logic.

In the long-term then, I'm planning to avoid having too many languages by replacing Python with Rust, and this current three-language state is a compromise to get PGFinder 2.0 and the corresponding paper out before that greater project is complete (which could be some months still).

I'm happy to chat more or have a meeting if I sound crazy or if people have more questions, but that's how I've been thinking of things at least!

ns-rse commented 2 months ago

Sorry for the delayed response!

Asynchronicity is fine by me and 7hrs isn't long.

Thanks for the update on all these aspects, I'll check out the branch and start working out how to do things. How are the artefacts in that folder created?

As I wrote I'm not adverse to using Rust going forward and that there are existing tools that are linking in and being leveraged is a good reason.

My perspective is very much about maintainability of the code, you won't always be around or available to work on issues and so regardless of the language used it will be important to have not just a working tool but also...

Documentation - on usage and how to use and build the tools.
Unit tests - to ensure the code does what its expected to and when modified doesn't break.
Continuous Integration - automating the unit testing, building and deploying websites.

In this regard the following guidelines used in reviewing software are useful...

One obvious omission I've noticed is the lack of a license applied to smithereens which means its technically not openly licensed (the default if there is nothing explicit).

TheLostLambda commented 2 months ago

Hi @ns-rse!

The artefacts in that folder are created by wasm-pack following the workflow described by https://rustwasm.github.io/docs/book/ !

I agree with the importance of maintainability here! I'm making large effort in that regard with this third iteration — you need to use the command in the justfile, but I've got more than 100 unit tests so far and am using a code-coverage tool to ensure that every branching path in the code is tested (not just lines).

I've not yet put that workflow into CI form, since it's currently very much in single-developer prototype mode, and documentation hasn't been written just yet because it's not clear what will be public API just yet! The code is split into several stand-alone libraries that will eventually be pulled together by a single application, but most of these parts will be reusable for other purposes.

Thanks for pointing out the license! I actually didn't know that it defaulted to something closed!

smesnage commented 2 months ago

Just a quick comment to follow up on Brooks' messages.

We're in a quite special situation with the addition of the masscalc and fragment predictor in the WebUI. Basically, we need to include them them in PGFinder in order to publish the paper we have in our drawer (which used the fragment predictor). However, a better tool is in the making so at some point, both will be replaced. Not quite sure when this will happen though (it could take a while!) so I would rather not wait.

Given this preambule, I suggest taking pragmatic approach: 1) we make the magic happen and plug the two "modules" into PGFinder using the existing code that Brooks has built 2) we sort out documentation and test units (minimal effort, no one will read these anyway, the WebUI should be dumbproof) 3) we don't bother too much about the elegance of the code, all we need is something that works.

Sometimes a dirty job is good enough!

ns-rse commented 2 months ago

Hi @TheLostLambda and @smesnage

Apologies for the slight delay, I'm trying to time-box work on projects to specific days as I juggle multiple projects as I've a bad habit of getting sucked in and spending too much time on one thing when I should be working on others. Wednesday is my PGFinder day so will forge ahead tomorrow with reviewing the PR and understanding Web Assembly.

Wanted to say thought that its good to hear you're working on the tests and documentation of everything @TheLostLambda and that I understand the need to get something that is working out @smesnage .

ns-rse commented 1 month ago

I've made some progress with my understanding of JavaScript/TypeScript/HTML/CSS/Svelte/Rust/WebAssembly

Work-in-progress is on the ns-rse/259-muropeptides-fragment branch for curious parties not muchon the Rust/WebAssembly side as I'm finding that massively confusing so I've been focusing on documentation (writing down what I've been doing as I'm mindful someone new is likely be working on this later in the year so documentation is going to be really important) as well as getting the WebUI elements in place (although they're not working yet as I've not sussed that all out just yet).

In checking what is required though I see that in the mock-up PowerPoint slides there are for the "Muropeptide list" some items under the "Built-In" which I'm not sure about and wanted to check.

For the target structures I have a file with...

Structure
gm |0
gm-gm |0
gm(Anh) |1
gm(-Ac) |1
gm-AEJ |1
gm-AEJA |1
gm-AEJG |1
gm-AEJAG |1
gm-AEJKR |1
gm-AEJ (Anh) |1
gm-AEJA (Anh) |1
gm-AEJA (-Ac) |1
gm-gm-AEJA |1
gm-AEJA=gm-AEJA (4-3) |2
gm-AEJA=gm-AEJ (3-3) |2
gm-AEJ=gm-AEJ (3-3) |2
gm-AEJKR=gm-AEJA (4-3) |2
gm-AEJA=gm-AEJA (Anh) (4-3) |2
gm-AEJAG=gm-AEJA (4-3) |2
gm-AEJKR=gm-AEJ (Anh) (3-3) |2
gm-gm-AEJA=gm-AEJA (4-3) |2
gm(-Ac)-AEJA=gm-AEJA (4-3) |2
gm(+Ac)-AEJA=gm-AEJ (Anh) (3-3) |2

Do these cover all of the six species listed or is there meant to be a single file for each species and users are then able to select which or upload their own custom file?

smesnage commented 1 month ago

Hi,

I reply here because I have no clue where to find this on Github. Total mystery.

The list below is just to provide the user with a sample of structures that show the syntax that describes peptidoglycan fragments. There were some mistakes but I have checked with Brooks and the list belon in bold is the right one. Ooops.

Structure gmgm CHANGED — drop the - or use ~ instead gm(Anh)

gm(DeAc)g(DeAc)mgm (DeAc) gm-AEJ gm-AEJA gm-AEJG gm-AEJAG gm-AEJKR gm-AEJ (Anh) CHANGED — needed a space! gm-AEJA (Anh) CHANGED — needed a space! gmgm-AEJA CHANGED — drop the - or use ~ instead gm-AEJA=gm-AEJA (4-3) gm-AEJ=gm-AEJA (3-3) CHANGED — wrong order, 3-3 implies the donor is the left / first structure gm-AEJ=gm-AEJ (3-3) gm-AEJA=gm-AEJKR (4-3) CHANGED — wrong order, 4-3 implies the donor is the left / first structure gm-AEJA=gm-AEJA (Anh) (4-3) gm-AEJA=gm-AEJAG (4-3) CHANGED — wrong order, 4-3 implies the donor is the left / first structure gm-AEJ=gm-AEJKR (Anh) (3-3) CHANGED — wrong order, 3-3 implies the donor is the left / first structure gmgm-AEJA=gm-AEJA (4-3) CHANGED — drop the - or use ~ instead gm(DeAc)-AEJA=gm-AEJA (4-3) CHANGED — -Ac to DeAc gm-AEJ=gm(Ac)-AEJA (Anh) (3-3) CHANGED — -Ac to DeAc and wrong order, 3-3 implies the donor is the left / first structure

The rules are as follows: the Anh, DeAc modifications apply to the residues that precedes them OR can be on either residue if you add a space; g(DeAc)m means that g is deacetylated, m is not gm(DeAc) means m is deacetylated, g is not gm (DeAc) means either m is deacetylated

Brooks' script will be able to spot mistakes and fix them so let's provide this model syntax and let's not go over the top with documentation for now. I can certainly do it and I would rather you spent your time on stuff that only you can do ($$$$$...).

Let me know if you have any questions, I hope this is helping!

I don't understand the question below: Do these cover all of the six species listed or is there meant to be a single file for each species and users are then able to select which or upload their own custom file?

I believe these examples are representative of the diversity of fragments that users will include in their database.

Let me know if you have any questions!

ns-rse commented 1 month ago

Hi @smesnage

Thanks for that, I think the finer details of what is included can be wrangled at a later date I was just curious if there would be multiple files from which users could choose or nor (as is the case in the existing functionality) as that influences how the dialogue would be created.

I feel like I'm slowly getting the hang of how the website framework functions and hope to have the layout in place after another day on it next week (hooking it into so that the "Build database" button does what it needs to leveraging the smithereens programme would be the next step after that, I'm trying to walk before running!).

smesnage commented 1 month ago

Ah, I understand the question now. In fact you already asked and the answer is yes, I need to provide these for model organisms:

Bacillus subtilis
Staphylococcus aureus - Enterococcus faecalis - Enterococcus faecium St

ns-rse commented 1 month ago

Cool, thanks for the confirmation.

I can simply make dummy files for the time being and they can be replaced once you've got them ready.

ns-rse commented 1 month ago

@TheLostLambda : I've hit an impasse with Svelte and am unsure how to proceed. Web development is not something I've done before (nor are the JavaScript/TypeScript languages) so your advice and guidance would be very much appreciated.

Things I have Understood

How to include data

The mass data is loaded by functions in the pgfinder.gui.internal.py, these look specific to each file type so I've attempted to generalise them to a single function with a view to needing to load other file types (reference masses and target structures). The metadata about these are stored in index.json files within the directory. I've therefore created two new directories lib/pgfinder/reference_masses/ and lib/pgfinder/target_structures/ to hold these data and added the necessary metadata to JSON files.

How to add a card

Top-level page layout is in +page.svelte which includes child documents, MsDataUploader.svelte and a newer FragmentsDataUploader.svelte which I intend to be the required card shown in the Powerpoint slides liked in the first comment.

FragmentsDataUploader.svelte has copied the structure of the MsDataUploader.svelte and there are two sections one for Building block components (aka Reference Masses) and one for Muropeptide List (aka Target Structures). These don't yet load the sample files.

Generalise some `lib/pgfinder/gui/.py` functions

I've started attempting to generalise the functions here and write simple tests to check they return what is required.

Things I haven't Understood/Got working

I've found the layout of everything to be quite confusing. There appears to be a Python function pyio defined within TypeScript of the Svelte framework which is loaded by the pgfinder.gui.shim.run_analysis() function and I don't understand why that is done like that rather than passing the data across.
I've not yet been able to get the TypeScript code in web/src/lib/pgfinder.ts to correctly return the data for reference masses nor target structures.
I've attempted to call the functions and objects I've introduced by adding lines to this script but it just results in everything hanging and I can not see what is going on in the background with the Python code that is being called and have no idea how to find this out. The code is left in place but commented out.

Once the files can be uploaded from the Python package I can then start looking at incorporating them into the smithereens WebAssembly compiled binaries.

`smithereens`

I spent some time trying to build the binaries myself on my local system in order to document the process but the build failed both on my system and using the WebAssembly framework. I've seen there has been a lot of activity on this repository (over 200 commits last time I looked) but long term having it build automatically will be imperative but I'm not pursuing this further myself and instead focusing on the WebUI so I can then use the provisioned binaries.

Current state of WebUI

On the branch ns-rse/259-muropeptides-fragment the website builds and runs but the card for loading the defaults doesn't ever render the metadata to allow users to select a Reference Mass or Target Structures.

Documentation

Given the lack of documentation and the paucity of my familiarity with this framework I've attempted to sketch out the structure of how components are interacting in Mermaid State Diagram.

The layout isn't perfect but the code is stored in a gist. I've added it to docs/contributing.md where I've been making notes so far because its going to be really, really important that the current state is well documented for the new PhD student to get started with PGFinder (these can be removed or moved prior to merging). This resides on the ns-rse/259-muropeptides-fragment branch but can also be viewed at mermaid.live which might be easier as affords the ability to pan and zoom, although maybe the embedded image below is just as easy to use.

%%{init: {'theme': 'forest'
         }
}%%

    stateDiagram-v2

    classDef data fill:cyan
    classDef webui fill:pink
    classDef processing fill:yellow

    state Svelte {

    MsDataUploader --> +page.svelte
    note right of +page.svelte 
        Lays out the Page

        Includes Typescript that imports :
          + AdvancedOptions from ./AdvancedOptions.svelte
          + Footer from ./Footer.svelte
          + Header from ./Header.svelte
          + LinksAndDownloads from ./LinksAndDownloads.svelte
          + MassLibraryUploder from ./MassLibraryUploader.svelte
          + MsDataUploader from ./MsDataUploader.svelte

        Calls PGFinder (which runs under WebAssembly) and loads :
          + pgfinderVersion
          + allowedModifications
          + massLibraries
          + fragmentsLibraries (in progress)
          + muropeptidesLibraries (in progress)

        When state is ready :
          + runs analysis calling the pgfinder.postMessage(pyio) from pgfinder.ts 
    end note
    AdvancedOptions --> +page.svelte
    Footer --> +page.svelte
    Header --> +page.svelte
    LinksAndDownloads --> +page.svelte
    FragmentDataUploader --> +page.svelte
    +page.svelte --> pgfinder
    pgfinder --> +page.svelte
    web/src/app.d.ts --> FragmentDataUploader
    web/src/app.d.ts --> MsDataUploader
    note left of web/src/app.d.ts
        TypeScript file defining types of different elements.

        Key is Pyio which contains the results of different stages of loading 
        and processing data.
    end note

    pyio --> gui.shim
    note right of pyio
    Some Python code written within TypeScript that passes parameters???
    end note
    }
    state pgfinder {
        pgio.theo_masses_reader() --> gui.internal
        pgio.ms_file_reader() --> gui.internal
        gui.internal --> gui.shim
        gui.shim --> run_analysis()
        gui.shim --> load_libraries()
        note right of load_libraries()
        This is meant to load the reference masses and target structures
        and return them to the 

        CURRENTLY DOES NOT APPEAR TO WORK
        end note

        load_libraries() --> FragmentDataUploader
        run_analysis() --> Results
    }
    state Results {
        yswsii:Results Files
    }
    state MsDataUploader {
        MsDataUploader.svelte
        note left of MsDataUploader.svelte
        Layout of Mass Uploader Card

        Includes Typescript that allows user to upload file.
        end note
    }
    state FragmentDataUploader {
        FragmentDataUploader.svelte
        note right of FragmentDataUploader.svelte
        Layout of Fragment and Target Structures Card

        Includes Typescript that allows user to upload files.
        end note
    }
    state web/src/app.d.ts {
        VirtFile
        Pyio
        MsgType
        MassLibraryIndex
        FragmentsLibraryIndex
        MuropeptidesLibraryIndexIndex
        ReadyMsg
        ResultMsg
        ErrorMsg
        Msg
    }
    state masses {
        index1.json
        note left of index1.json
        Defines the files with molecular masses, one per species 
        along with description displayed in Website
        end note
        c_diff_monomers_complex.csv
        c_diff_monomers_non_redundant.csv
        c_diff_monomers_simple.csv
        e_coli_monomers_complex.csv
        e_coli_monomers_non_redundant.csv
        e_coli_monomers_simple.csv
    }
        state reference_masses {
        index2.json
        note left of index2.json
        Defines the files with reference masses for components used 
        in building masses
        end note
        e_coli1.csv
    }
    state target_structures {
        index3.json
        note left of index3.json
        Defines the files with target_structure, one per species 
        along with description displayed in Website

        NB - These are currently dummy files and internally have the 
        same data.
        end note
        b_subtillis.csv
        e_coli2.csv
        e_faecalis.csv
        e_faecium.csv
        s_aureus.csv
    }
    class masses,reference_masses,target_structures data
    class Svelte,MsDataUploader,FragmentDataUploader webui
    class pgfinder processing

TheLostLambda commented 1 month ago

@ns-rse Would you be available for a Google Meet or Zoom call soon? I can do early in my morning, to fit within your working hours?

If not, I can write up a proper reply! Either way it's good to see some architecture diagramming come together!

TheLostLambda commented 1 month ago

It would definitely be good to go through all of the documentation you've been writing on a call!

But in the meantime briefly: 1) If I understand what you mean correctly, the pyio isn't a function, but a namespace that's used to move data from JS to Python — it's like from js import <whatever>, but it's a defined type / scope that doesn't leak the whole global JS namespace. It's based on this part of the docs: https://pyodide.org/en/stable/usage/type-conversions.html#importing-javascript-objects-into-python 2) Have you been building and importing a local python wheel? Otherwise it's hard-coded to pull a specific PGFinder version from PyPi, which won't have any of your changes! Here is an old commit where I was importing a local wheel: https://github.com/Mesnage-Org/pgfinder/commit/70af49eaf1895a5b948b705048daf0d8ba39f608 3) Have you gotten anything in the web console as far as an error message? I've often just imported things by hand in the web-repl to test, before adding it into the script! It's not impossible that error-reporting is somehow obscured by the fact things are running in an async web-worker — I'd have to take a look!

For smithereens, things have improved a bit and CI is constantly checking I don't break WASM compilation, but I've not yet set up wasm-pack or anything for the main branch. The branch you're working from should definitely compile though, so if you're having trouble, I'm happy to debug on a call!

ns-rse commented 1 month ago

Hmm, wrote a reply late last night but seemed not to have committed it.

Sent you an invitation via Google to chat today at 16:00 (BST) if you're free.

If I understand what you mean correctly, the pyio isn't a function, but a namespace that's used to move data from JS to Python — it's like from js import , but it's a defined type / scope that doesn't leak the whole global JS namespace. It's based on this part of the docs: pyodide.org/en/stable/usage/type-conversions.html#importing-javascript-objects-into-python

Getting JavaScript and Python to play ball is all new to me.

Have you been building and importing a local python wheel? Otherwise it's hard-coded to pull a specific PGFinder version from PyPi, which won't have any of your changes! Here is an old commit where I was importing a local wheel: 70af49e

Ah now this is something major that I'd not clocked, will check out that commit and see if I can test locally, having to use test.pypi.org or even make pre-releases is sub-optimal.

Have you gotten anything in the web console as far as an error message? I've often just imported things by hand in the web-repl to test, before adding it into the script! It's not impossible that error-reporting is somehow obscured by the fact things are running in an async web-worker — I'd have to take a look!

Only get errors when the page fails to render. Don't even know what web-repl is available with this framework.

TheLostLambda commented 1 month ago

@ns-rse Whoops, I had an invite I saw at some point for now?

ns-rse commented 4 weeks ago

Hi @TheLostLambda

Progress of sorts!

Switched to using the in-progress update to PGFinder which includes dummy (for now) building block files and muropeptide files along with JSON meta-data descriptors.
Got radial buttons for selecting which of these Fragments and Muropeptides files are to be uploaded. These are derived from JSON configuration files that side alongside the (dummy for now) options and the ToolTips work.

I'm now starting to work on how, once these have been selected to call a function that runs Smithereens.

Attempted to adapt the logic in web/src/routes/+page.svelte to determine whether these files have been loaded and then I could start work wrestling with the Smithereens Rust WebAssembly part to do the work but attempts so far are unsuccessful. I suspect as I'm not passing the value from the <input type="radio" ...> correctly. This is set in each to be value={{ name: librariesMuropeptides['File'], content: null }} and value={{ name: librariesFragments['File'], content: null }}.

I'm relying heavily on the existing example you've written in BuiltinLibrarySelector.svelte (and a lot of trial and error!) and I see in there that the <ListBox>...</ListBox> is encapsulated by <svelte:fragment slot="content">...</svelte:fragment> and wondered if this is what I'm missing so tried adding this but without success as I get the error message <svelte:fragment> must be a child of a component but can't see what this is a child of in your example. I have checked and the file names are accessible, I just can't get them passed back from BuiltinFragmentsSelector.svelte > FragmentsDataUploader.svelte and BuiltinMuropeptidesSelector.svelte > MuropeptidesDataUploader.svelte where, if I've understood correctly, they would then satisfy the logic I put in to say they are available which is the following in +page.svelte

    // Reactively compute if Smithereens is ready
    $: SmithereensReady = !loading && !processing && pyio.fragmentsLibrary !== undefined && pyio.muropeptidesLibrary !== undefined;

Branch is ns-rse/259-muropeptides-fragment any insights or advice on what I've missed/done wrong would be very much appreciated (will be working on PGFinder for the rest of today and tomorrow as mindful of looming deadlines).

TheLostLambda commented 4 weeks ago

Hi @ns-rse ! I'll jot down some notes here!

It's very possible that you're just working with things in a different directory, but for me I needed to move the .whl into the static folder and get rid of the . before the path (changing ./ to just /) in pgfinder.ts, but if the local version of PGFinder is loading for you, then that's totally fine!

As for the <svelte:fragment ...> stuff, that's just used for passing HTML into some parent component. Here those are populating the placeholder slots of that <AccordionItem>, but it doesn't do anything else magical — just lets you provide an HTML "argument" to some parent component. That component comes from the Skeleton UI library, which has some pretty good docs: https://www.skeleton.dev/components/accordions

Probably not a core issue, but something flagged up by my type-checker — the defaultPyio is missing muropeptidesLibrary and fragmentsLibrary.

The pyio.fragmentsLibrary seems to be set correctly when you're doing a custom CSV file upload (adding $: console.log(pyio.fragmentsLibrary); shows that value updating after you choose a file).

The issue with the BuiltinFragmentsSelector was one probably brought on by my confusing variable naming... It was missing bind:group={value} which is what actually updates the value variable. The value= property has nothing to do with our value variable — it only specifies what the group value should be set to if that button is selected! That one is working too, now that that missing bind bit has been added.

Custom muropeptidesLibrary upload is already working! Again verified with $: console.log(pyio.muropeptidesLibrary); in +page.svelte.

Same missing bind:group={value} for BuiltinMuropeptidesSelector! After adding that, it works perfectly!

I've pushed a commit with the working directory I ended up with, including some messy console.log() stuff that's worth removing after you're done debugging, as is the .whl I committed to the static directory, but let me know if this was the sort of thing you needed to get unstuck!

ns-rse commented 4 weeks ago

Hi @TheLostLambda

Oooh, I was close! Thank you for all of that really helpful.

It's very possible that you're just working with things in a different directory, but for me I needed to move the .whl into the static folder and get rid of the . before the path (changing ./ to just /) in pgfinder.ts, but if the local version of PGFinder is loading for you, then that's totally fine!

Not sure about this, I've maintained the repository with web/ and lib/ sub-directories and it "Just Works(tm)", I figured that because the pgfinderVersion string includes the commit hashes that it was picking up the package ok.

As for the <svelte:fragment ...> stuff, that's just used for passing HTML into some parent component. Here those are populating the placeholder slots of that <AccordionItem>, but it doesn't do anything else magical — just lets you provide an HTML "argument" to some parent component. That component comes from the Skeleton UI library, which has some pretty good docs: https://www.skeleton.dev/components/accordions

Thanks for the explanation/pointer that is going to be really useful. I haven't found the Svelte examples that informative yet. Once I've got it working as it currently is I'll perhaps look at replacing with RadioGroups.

Probably not a core issue, but something flagged up by my type-checker — the defaultPyio is missing muropeptidesLibrary and fragmentsLibrary.

I had a suspicion I had missed something, again thank you for finding and correcting.

The pyio.fragmentsLibrary seems to be set correctly when you're doing a custom CSV file upload (adding $: console.log(pyio.fragmentsLibrary); shows that value updating after you choose a file).

Ahha, didn't know about adding debugging that way, thank you for the pointer. I hadn't got round to trying uploading custom CSV files yet either as I've been focusing on loading the default (dummy) files I've added. Good to know that is already working.

The issue with the BuiltinFragmentsSelector was one probably brought on by my confusing variable naming... It was missing bind:group={value} which is what actually updates the value variable. The value= property has nothing to do with our value variable — it only specifies what the group value should be set to if that button is selected! That one is working too, now that that missing bind bit has been added.

Custom muropeptidesLibrary upload is already working! Again verified with $: console.log(pyio.muropeptidesLibrary); in +page.svelte.

Same missing bind:group={value} for BuiltinMuropeptidesSelector! After adding that, it works perfectly!

I thought I was missing something for value and tried several variations without success, again thanks for the pointers and corrections.

I've pushed a commit with the working directory I ended up with, including some messy console.log() stuff that's worth removing after you're done debugging, as is the .whl I committed to the static directory, but let me know if this was the sort of thing you needed to get unstuck!

Spot on, got me back on track and I'll start working on using the selected files with Smithereens, thank you :pray:

ns-rse commented 3 weeks ago

Hi @TheLostLambda ,

I'm stumped again as I'm unsure what the functions/classes are that have been compiled into the Smithereens WebAssembly.

I'm using the Rust WASM : hello world example as a basis and looking through the files there are...

`smithereens_bg.wasm.d.ts`

TypeScript that exports a bunch of functions (with types), I'd hazard a guess that those beginning with __ are not called directly which leaves...

peptidoglycan_new(a: number, b: number, c: number): void - ?
peptidoglycan_monoisotopic_mass(a: number, b: number, c: number): void - ?
pg_to_fragments(a: number, b: number, c: number): void - sounds like its the reverse of the builder and breaks something down?

`smithereens_bg.wasm`

This is the compiled web assembly, its contents are exposed/made available by the above file?

`smithereens_bg.js`

This "contains JavaScript glue for importing DOM and JavaScript functions into Rust and exposing a nice API to the WebAssembly functions to JavaScript." so I don't think I need to touch or do anything with this? :shrug:

I can see it defines and exports a Peptidoglycan() class is this what I need to use?

`smithereens.d.ts`

TypeScript that exports function and class (with types).

function

pg_to_fragments(precursor: Peptidoglycan): string: - Perhaps related to pg_to_fragments() that has its type defined in smithereens_bg.wasm.d.ts but unclear how?

Class

Peptidoglycan ...

export class Peptidoglycan {
  free(): void;
/**
* @param {string} structure
*/
  constructor(structure: string);
/**
* @returns {string}
*/
  monoisotopic_mass(): string;
}

Questions

What is the significance of _bg in the JavaScript/TypeScript files?
What is the Class/function in Smithereens that I need to use?

My very crude/cursory play with smithereens and build the package natively is that it results in a binary called smithereens which can be run.

❱ smithereens
Molecule: H2O
Monoisotopic Mass: 18.010565
Average Mass: 18.0153
Charge: 0

Molecule: CH4
Monoisotopic Mass: 16.031300
Average Mass: 16.0425
Charge: 0

Molecule: C2H6
Monoisotopic Mass: 30.046950
Average Mass: 30.0691
Charge: 0

Molecule:

I don't understand how the WASM you compiled relates to this and what functions/classes need to be used to take the FragmentsLibrary and MuropeptidesDataLibrary that are loaded from PGFinder.

Pointers very much welcome. Afraid I have child care duties this evening so can't easily jump on a call hence asking here.

TheLostLambda commented 3 weeks ago

Hi @ns-rse !

As for the _bg files, those look like internal "bindgen" files that we can safely ignore / not touch! https://github.com/rustwasm/wasm-bindgen/issues/2290
The smithereens.d.ts has the interface you need! This little bit of code demonstrates everything currently exported: https://github.com/TheLostLambda/pg-pipeline/blob/wasm-pilot/test-webapp/index.js . Note that that's on the wasm-pilot branch, and that code doesn't quite exist on the main branch yet!
I'd need to check with @smesnage , but those fragment libraries might just be pre-computed downloads? The muropeptide libraries can have their masses calculated (by calling .monoisotopic_mass()), and then those structures can be fragmented by pg_to_fragments(...). Essentially, I think the key is just repeating the three lines of code in https://github.com/TheLostLambda/pg-pipeline/blob/wasm-pilot/test-webapp/index.js for every structure gm-A, gm-AEJA, gm-AEJA=gm-AEJ, etc in those muropeptide libraries!

Let me know if that's the info you were after! Definitely check out what that example on wasm-pilot returns on the web console!

ns-rse commented 3 weeks ago

Hi @TheLostLambda

Again really helpful. I'll have a look at this tomorrow and hope to make progress.

Cheers :+1:

ns-rse commented 2 weeks ago

Hi @TheLostLambda

Took a bit longer to look at this and whilst the example was useful I'm struggling to adapt it to work with Svelte.

Currently I've at least two problems that I'm aware of.

Python

Progress

I've added the example files to pgfinder and written JSON descriptors with Metadata.

I've put in place all the boiler plate code to add these to Svelte and I have a box which gets the JSON files, reads the metadata and allows selection of components and there is a "Build Database" button to build the database of muropeptide masses based on the provided mass library.

Problem

I don't understand how to make this button reactive.

For processing with PGFinder the src/lib/pgfinder.ts has onmessage defined which calls pyodide.runPythonAsync('run_analysis()').then(postResults).catch(postError); and run_analysis() is defined in pgfinder/gui/shim.py.

I've added a load_libraries() function to pgfinder/gui/shim.py, it should (I hope!) load the fragmentsLibrary and muropeptidesLibrary from the pyio that is defined in +page.svelte but can not work out how to make onmessage unique to each button that is now present in the WebUI. :shrug:

Rust WASM

The example you pointed me to was useful so I added

import * as smithereens_wasm from "$lib/smithereens";

This throws an error...

"ESM integration proposal for Wasm" is not supported currently. Use vite-plugin-wasm or other community plugins to handle this. Alternatively, you can use `.wasm?init` or `.wasm?url`. See https://vitejs.dev/guide/features.html#webassembly for more details.

...and so I duly installed vite-plugin-wasm and import it before trying to import smithereens...

import wasm from "vite-plugin-wasm";
import * as smithereens_wasm from "$lib/smithereens";

...but I still get the above complaint/error that I need to use vite-plugin-wasm.

Thus aside from not yet being able to load the two libraries because of the Python problem, I wouldn't be able to do anything with them because I can't import the Rust WASM compiled smithereens.

I've gone round in many circles it feels tinkering with x, y and z and have read through the Rust WASM pages multiple times but not gained any insight.

Any suggestions as to how to 1) run the shim.load_libraries() function and 2) import the Rust WASM so these can be passed/used would again be very much appreciated.

Current state of play is on ns-rse/259-muropeptides-fragment

:confused:

TheLostLambda commented 2 weeks ago

Hi @ns-rse !

Working in reverse here, these web bundlers (like Vite, used here) have always been a pain-point for me as well, but there is a Vite plugin (a bit different from the one you've added) that I've used successfully before: https://github.com/nshen/vite-plugin-wasm-pack . That one is meant to work with Rust's wasm-pack in particular!

These bundler plugins are configured in their own file, then will automatically resolve imports when it comes across them!

Following the manual install process, I end up with: vite.config.ts

import { purgeCss } from 'vite-plugin-tailwind-purgecss';
import { sveltekit } from '@sveltejs/kit/vite';
import { defineConfig } from 'vite';
import wasmPack from 'vite-plugin-wasm-pack';

export default defineConfig({
    plugins: [sveltekit(), purgeCss(), wasmPack('./smithereens')],
    worker: {
        format: 'es'
    }
});

(I moved the smithereens stuff around, since vite-plugin-wasm-pack wants things in a pkg subdirectory)

Then in +page.svelte:

import init, { Peptidoglycan, pg_to_fragments } from 'smithereens';
// ...
    onMount(() => {
        init().then(() => {
            console.log("smithereens wasm loaded!");
        })
    })
// ...
function runSmithereensAnalysis() {
    let pg = new Peptidoglycan("gm-AEJA")
    console.log(`Monoisotopic Mass : ${pg.monoisotopic_mass()}`);
    console.log(`Fragments :\n ${pg_to_fragments(pg)}`);
}

To fix another bundler error, I needed add "type": "module" to the smithereens package.json, since I generated that before this bug was fixed: https://github.com/rustwasm/wasm-pack/pull/1061

Here is the example I used to figure out that loading process! https://github.com/nshen/vite-plugin-wasm-pack/blob/main/example/src/index.ts

After that, and on the current version of the branch I just pushed, things are working! You can call the smithereens WASM functions!

As for the first half of your question, the PGFinder Pyodide code all runs in a separate "thread" from the main Javascript (called a web-worker: https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers), and the API of that is very minimalist. You can pass arbitrary messages back and forth, and it's up to you to write code that distinguishes between the different events being sent over the channel. So the most direct answer to your question might be adding a new message type, using the same onmessage, but then checking the type field and doing something different depending on its value.

With that being said, I think there is probably a better way to do this! I think we should create a new web-worker, since smithereens and pgfinder can (and should) work independently from each other! That would mean creating something like a smithereens.ts file, and adding an import Smithereens from '$lib/smithereens.ts?worker'; or similar to +page.svelte. Then you can set up your own messages to communicate once smithereens is loaded (after init() returns), and to send data back and forth.

Using the pgfinder.ts file as a template for how to do this sort of thing would be reasonable!

Sorry that's not super exhaustive, but hopefully it's enough to get you unstuck on the WASM loading front, and gives you a place to start with making a new web worker for smithereens to live in!

Let me know if you have any more questions!

ns-rse commented 2 weeks ago

Thanks for that @TheLostLambda will have a go at working through all this and no doubt get stuck again and be back with more questions. :smile:

ns-rse commented 2 days ago

Hi @TheLostLambda

Some progress but have still hit a rock and have little idea of what I'm doing.

Rust WASM

`vite.config.ts`

As advised I currently have...

import { purgeCss } from 'vite-plugin-tailwind-purgecss';
import { sveltekit } from '@sveltejs/kit/vite';
import { defineConfig } from 'vite';
import wasmPack from 'vite-plugin-wasm-pack';

export default defineConfig({
    plugins: [sveltekit(), purgeCss(), wasmPack('./smithereens/')],
    worker: {
        format: 'es'
    }
});

...and ./smithereens/ lives within the top level of web/ and all of the pre-compiled Rust Web Assembly files have been placed there...

(pgfinder2024) ❱ pwd            
/home/neil/work/git/hub/Mesnage-Org/pgfinder/web
(pgfinder2024) ❱ tree smithereens 
[4.0K May  3 15:52]  smithereens
└── [4.0K May  3 17:00]  smithereens/pkg
    ├── [ 409 May  3 16:58]  smithereens/pkg/package.json
    ├── [ 10K May  3 15:52]  smithereens/pkg/smithereens_bg.js
    ├── [5.6M May  3 15:52]  smithereens/pkg/smithereens_bg.wasm
    ├── [ 702 May  3 15:52]  smithereens/pkg/smithereens_bg.wasm.d.ts
    ├── [ 352 May  3 15:52]  smithereens/pkg/smithereens.d.ts
    └── [ 160 May  3 15:52]  smithereens/pkg/smithereens.js

2 directories, 6 files

`smithereens/pkg/package.json`

As suggested I added "type": "module", it currently looks like...

{
  "name": "smithereens",
  "version": "0.1.0",
  "type": "module",
  "files": [
    "smithereens_bg.wasm",
    "smithereens.js",
    "smithereens_bg.js",
    "smithereens.d.ts"
  ],
  "module": "smithereens.js",
  "types": "smithereens.d.ts",
  "sideEffects": [
    "./smithereens.js",
    "./snippets/*"
  ],
  "dependencies": {
    "vite-plugin-wasm": "^3.3.0"
    "vite-plugin-wasm-pack": "^0.1.12"
  }
}

`+page.svelte.ts`

Attempted to import Smithereens this by adding...

import init, { Peptidoglycan, pg_to_fragments } from 'smithereens';

But it errors with...

11:33:44 [vite] Internal server error: Failed to resolve import "./smithereens_bg.wasm" from "@vite-plugin-wasm-pack@smithereens". Does the file exist?
  Plugin: vite:import-analysis
  File: @vite-plugin-wasm-pack@smithereens:1:24
  1  |  import * as wasm from "./smithereens_bg.wasm";
     |                         ^
  2  |  import { __wbg_set_wasm } from "./smithereens_bg.js";
  3  |  __wbg_set_wasm(wasm);
      at formatError (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:44062:46)
      at TransformContext.error (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:44058:19)
      at normalizeUrl (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:41844:33)
      at async file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:41998:47
      at async Promise.all (index 0)
      at async TransformContext.transform (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:41914:13)
      at async Object.transform (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:44352:30)
      at async loadAndTransform (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:55026:29)
      at async viteTransformMiddleware (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:64430:32)

Which I found confusing since this is coming from smithereens/pkg/smithereens.js and it sits alongside smithereens_bg.wasm in the same directory.

My first thought was "Maybe I need to put the full path in?" so changed smithereens/pkg/smithereens.js to...

import * as wasm from "./smithereens/pkg/smithereens_bg.wasm";
import { __wbg_set_wasm } from "./smithereens/pkg/smithereens_bg.js";
__wbg_set_wasm(wasm);
export * from "./smithereens/pkg/smithereens_bg.js";

...and got a new error message (progress!!!)...

11:39:45 [vite] page reload smithereens/pkg/smithereens.js
11:39:45 [vite-plugin-svelte] /home/neil/work/git/hub/Mesnage-Org/pgfinder/web/src/routes/+page.svelte:95:2 $: has no effect outside of the top-level
11:39:45 [vite-plugin-svelte] /home/neil/work/git/hub/Mesnage-Org/pgfinder/web/src/routes/+page.svelte:98:2 $: has no effect outside of the top-level
11:39:45 [vite-plugin-svelte] /home/neil/work/git/hub/Mesnage-Org/pgfinder/web/src/routes/+page.svelte:100:2 $: has no effect outside of the top-level
11:39:45 [vite] Error when evaluating SSR module /src/routes/+page.svelte: failed to import "@vite-plugin-wasm-pack@smithereens"
|- Error: "ESM integration proposal for Wasm" is not supported currently. Use vite-plugin-wasm or other community plugins to handle this. Alternatively, you can use `.wasm?init` or `.wasm?url`. See https://vitejs.dev/guide/features.html#webassembly for more details.
    at Context.load (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:42482:19)
    at Object.load (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:44323:64)
    at async loadAndTransform (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:54954:24)
    at async instantiateModule (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:55951:10)

Error: "ESM integration proposal for Wasm" is not supported currently. Use vite-plugin-wasm or other community plugins to handle this. Alternatively, you can use `.wasm?init` or `.wasm?url`. See https://vitejs.dev/guide/features.html#webassembly for more details.
    at Context.load (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:42482:19)
    at Object.load (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:44323:64)
    at async loadAndTransform (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:54954:24)
    at async instantiateModule (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:55951:10)
Error: "ESM integration proposal for Wasm" is not supported currently. Use vite-plugin-wasm or other community plugins to handle this. Alternatively, you can use `.wasm?init` or `.wasm?url`. See https://vitejs.dev/guide/features.html#webassembly for more details.
    at Context.load (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:42482:19)
    at Object.load (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:44323:64)
    at async loadAndTransform (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:54954:24)
    at async instantiateModule (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:55951:10)

...which I've seen before but have already installed the vite-plugin-wasm. So I tried the alternative of adding ?init to the import in web/smithereens/pkg/smithereens.js

import * as wasm from "./smithereens/pkg/smithereens_bg.wasm?init";
import { __wbg_set_wasm } from "./smithereens/pkg/smithereens_bg.js";
__wbg_set_wasm(wasm);
export * from "./smithereens/pkg/smithereens_bg.js";

No joy, still get an error.

11:45:21 [vite] page reload smithereens/pkg/smithereens.js
11:45:21 [vite-plugin-svelte] /home/neil/work/git/hub/Mesnage-Org/pgfinder/web/src/routes/+page.svelte:95:2 $: has no effect outside of the top-level
11:45:21 [vite-plugin-svelte] /home/neil/work/git/hub/Mesnage-Org/pgfinder/web/src/routes/+page.svelte:98:2 $: has no effect outside of the top-level
11:45:21 [vite-plugin-svelte] /home/neil/work/git/hub/Mesnage-Org/pgfinder/web/src/routes/+page.svelte:100:2 $: has no effect outside of the top-level
11:45:21 [vite] Error when evaluating SSR module /src/routes/+page.svelte: failed to import "@vite-plugin-wasm-pack@smithereens"
|- Error: "ESM integration proposal for Wasm" is not supported currently. Use vite-plugin-wasm or other community plugins to handle this. Alternatively, you can use `.wasm?init` or `.wasm?url`. See https://vitejs.dev/guide/features.html#webassembly for more details.
    at Context.load (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:42482:19)
    at Object.load (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:44323:64)
    at async loadAndTransform (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:54954:24)
    at async instantiateModule (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:55951:10)

Error: "ESM integration proposal for Wasm" is not supported currently. Use vite-plugin-wasm or other community plugins to handle this. Alternatively, you can use `.wasm?init` or `.wasm?url`. See https://vitejs.dev/guide/features.html#webassembly for more details.
    at Context.load (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:42482:19)
    at Object.load (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:44323:64)
    at async loadAndTransform (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:54954:24)
    at async instantiateModule (file:///home/neil/work/git/hub/Mesnage-Org/pgfinder/web/node_modules/.pnpm/vite@4.5.2_@types+node@20.5.9/node_modules/vite/dist/node/chunks/dep-52909643.js:55951:10)

If I disable trying to make this import the page at least loads, although the buttons are no longer populated as the pgfinder.ts worker has had functionality moved...

Pyodide - smithereens worker.

Had a go at adding a new workers as suggested using web/src/lib/pgfinder.ts as a template.

This new worker will have to load the fragmentsLibraries and muropeptidesLibraries which reside under lib/pgfinder/reference_masses/{index.json,e_coli.csv} and lib/pgfinder/target_structures/{index.json,b_subtilis.csv,e_coli.csv,e_faecalis.csv ,e_faecium.csv,s_aureus.csv} and so I've left the (async () => {...})(); section in there but modified the postMessage to have content of fragmentsLibraries and muropeptidesLibraries...

    // Load lib/pgfginder/reference_masses/index.json which details the available reference masses
    const jsonFragments = await pyodide.runPythonAsync('reference_mass_library_index()');
    const fragmentsLibraries = JSON.parse(jsonFragments);

    // Load lib/pgfginder/target_structures/index.json which details the available target_structures/muropeptides
    const jsonMuropeptides = await pyodide.runPythonAsync('target_structure_library_index()');
    const muropeptidesLibraries = JSON.parse(jsonMuropeptides);

    postMessage({
        type: 'Ready',
        content: {
            fragmentsLibraries,
            muropeptidesLibraries
        }
    });

At some point this will need to load the libraries the user selects and then pass them to the Rust WASM (once that is working).

I don't really feel like I've got much idea at all what I'm doing here. That has been the case most days I've looked at this since its really not clear how this all knits together (hence my attempt to diagram it above).

Would be useful to sit down and go through it but with time differences its going to be tricky as I have child care again this evening and am busy tomorrow evening. Friday is a possibility but I wouldn't have masses of time due to family.

I realise you are busy with your own work but I've pushed everything as is to my branch (ns-rse/259-muropeptides-fragment) if you want to have a poke around.

TheLostLambda commented 1 day ago

@ns-rse Just a quick bit since I'm now in a time-zone for semi-immediate help: For the first bit, I think you might be missing calling the init function for the WASM?

https://github.com/Mesnage-Org/pgfinder/blob/c270df29771271ddefca48bda7c63a781bf3e55e/web/src/routes/%2Bpage.svelte#L94-L96

And that will need to be imported like this:

https://github.com/Mesnage-Org/pgfinder/blob/c270df29771271ddefca48bda7c63a781bf3e55e/web/src/routes/%2Bpage.svelte#L30

(I'll have a closer look and your second part soon!)

TheLostLambda commented 1 day ago

Okay, I read the second bit and I'm back in the UK! So hopefully that makes time-zone stuff a non-issue!

Just let me know whenever you'd want to meet!

ns-rse commented 1 day ago

@TheLostLambda

import init, { Peptidoglycan, pg_to_fragments } from 'smithereens';

I've tried adding that its on line 35 (it looks like your local branch or at least the commit you've linked to is about four commits behind the HEAD of ns-rse/259-muropeptides-fragment). It causes the above crash.

If you're free this afternoon I'm around until 17:00 will put something in the calendar.

Mesnage-Org / pgfinder

PGFinder 2.0 upgrade #259

Building Block components

Muropeptide list

Muropeptides

Reference Masses

Approach

Simple Example 1

Simple Example 2

Things get complicated!

Things I have Understood

How to include data

How to add a card

Generalise some lib/pgfinder/gui/.py functions

Things I haven't Understood/Got working

smithereens

Current state of WebUI

Documentation

smithereens_bg.wasm.d.ts

smithereens_bg.wasm

smithereens_bg.js

smithereens.d.ts

function

Class

Questions

Python

Progress

Problem

Rust WASM

Rust WASM

vite.config.ts

smithereens/pkg/package.json

+page.svelte.ts

Pyodide - smithereens worker.

Generalise some `lib/pgfinder/gui/.py` functions

`smithereens`

`smithereens_bg.wasm.d.ts`

`smithereens_bg.wasm`

`smithereens_bg.js`

`smithereens.d.ts`

`vite.config.ts`

`smithereens/pkg/package.json`

`+page.svelte.ts`