ProteoWizard / pwiz

The ProteoWizard Library is a set of software libraries and tools for rapid development of mass spectrometry and proteomic data analysis software.
http://proteowizard.sourceforge.net/
Apache License 2.0
218 stars 99 forks source link

Migrate all tutorials to Html #3097

Closed eduardo-proteinms closed 3 weeks ago

eduardo-proteinms commented 2 months ago

This PR migrates all docx files inside of the tutorial(pwiz_tools\Skyline\Documentation\Tutorials) directory. Careful with opening the full diff for this PR as it is too large to process. To take a look at the result the best way would be to checkout the branch and look through the documents that are generated in comparison to the existing document.

Html Conversion

Converting docx documents to html has alot of complications. Not all word document elements or styles can be matched perfectly while keeping a clean html output. I researched many conversion libraries and only found mammoth to generate clean html and was easily extensible/configurable. Even though this library has different language implementations, javascript is the most maintained and simplest to fork.   We leverage mammoth-js for the following key features:

The implementation of mammoth-js is not overly complex and for certain use cases I found the need to fork the library. This fork currently lives in my space but I will move it to the pwiz one. mammoth.js/tree/skyline-document-generation I implemented the following changes in the library:

Image Conversion

Images are written to files by simply leveraging mammoth's image reader. The html document has a relative link to this image.

For EMF and WMF images we needed special handling as browsers don't usually render these images. We discussed converting the images to PNG but ideally wanted to convert these to SVG as they keep their vector characteristics. To do this conversion I leveraged metanorma/libemf2svg which is a fork of the original libemf2svg but adds vcpkg support.

The main conversion script skyline.js leverages this library to do the image conversion alongside the document conversion. The document converter links original EMF and WMF to the converted image.

Html Formatting

This iteration leverages js-beautify to format the html. This could use some more thought as we want the html formatting to align with our editor/formatter of choice in development.

Default values are used to format the html other than the wrap_line_length which is set to 150.

Document Styling and Formatting

It is difficult to get all the styling and formatting of the word documents aligned. Using mammoth we map some styles to classes inside of SkylineStyles.css

This gives us a good starting point for having shares styling across all documents.

Issues

The remaining issues in my opinion will require manual checks and changes unless we find a pattern that affects too many cases.

Image Conversion

WMF and EMF conversion is successful for all images except one. Skyline Processing Grouped Study Data\image-0.wmf (in all languages) seems to not be valid to libemf2svg: WARNING(scanning): EMF file does not begin with an EMR_HEADER record ABORTING(scanning): invalid record - corrupted file? We can fix this as a manual step by converting this to a png manually.

Stroke Element

Images have custom stroke elements drawn on them: image We can manually take a screen shot of these images.

Affects: Skyline Targeted Method Refinement.docx Skyline Ion Mobility Spectrum Filtering.docx Skyline Processing Grouped Study Data.docx

Math Element

image

Affects: Skyline Existing and Quantitative Experiments.docx Skyline iRT Retention Time Prediction.docx

vPath Element

image

Affects: Skyline Targeted Method Refinement.docx Skyline Ion Mobility Spectrum Filtering.docx Skyline PRM.docx

Oval Element

image

Affects: Skyline Custom Reports.docx

AnchorLock Element

Some elements are anchored to a position in the document. This does not really translate image output html: image

Affects: Skyline Ion Mobility Spectrum Filtering.docx Skyline PRM.docx Skyline Custom Reports.docx(solving oval issue should solve for this one)

OLEObject Element

I'm not sure on this one, it seems like this image has some kind of linking to an application? ms-oleds My current guess is that double clicking this image opens the editor where it was made. image

Affects: Skyline Processing Grouped Study Data.docx

Overall Styling and Formatting Issues

There is bound to be things I have missed. It's hard to compare hundreds of pages of documents. Before making any of the manual fixes I want to make sure everything that is worth automating is done. I will do some more manual checks but could use some help or ideas on what we can do here.

chambm commented 2 months ago

This is great work Eduardo! As you mentioned GitHub really struggles on how much has changed here. Could you split the changes up by tutorial and make a separate commit for each one (it's probably fine to do all 3 languages at once for each tutorial)? Then we could click only on the commit for a single tutorial and look at the changes only related to that tutorial (the commits that provide shared resources like CSS used by all tutorials would be earlier commits, of course). This would let us review each tutorial separately and point out issues using GitHub's commenting features. GitHub's main diff page would still show the aggregate changes, but it's easy enough to click on an individual commit to see only its changes.

brendanx67 commented 2 months ago

Let’s not create that many PRs right now, though. Could we maybe start with a few of the important translations to perfect to process in just one PR, and create others after that gets merged?

On Tue, Jul 30, 2024 at 7:49 PM Matt Chambers @.***> wrote:

@.**** commented on this pull request.

In pwiz_tools/Skyline/Documentation/SkylineStyles.css https://github.com/ProteoWizard/pwiz/pull/3097#discussion_r1697353031:

+table, th, td {

  • border: 1px solid black;
  • border-collapse: collapse; +} +table {
  • width: 100%; +} +p.keep-next {
  • -webkit-break-after: avoid;
  • break-after: avoid; +} +p.bibliography {
  • font-family: "Cambria";
  • margin:0; +} +p.subtitle {

Testing that I can comment on a line when viewing a single commit.

— Reply to this email directly, view it on GitHub https://github.com/ProteoWizard/pwiz/pull/3097#pullrequestreview-2208320256, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACYBWUDAN4PYI5NH35NKUKLZO7GZTAVCNFSM6AAAAABLVHRBZ6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDEMBYGMZDAMRVGY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

chambm commented 2 months ago

I didn't say PRs, I said commits. Each tutorial can be a commit in this PR and they can be reviewed separately (by clicking on the commits, not by looking at the "Files changed" tab, which is still going to be 2000+ files).

eduardo-proteinms commented 2 months ago

@chambm That's a good idea, lets give it a try, I created the commits in this PR. Did this with a script so I could redo this if we need to run the conversion again.

Skyline MS1 Filtering e8a05ec61b2a1ebc90abbe6e696ecbd71afb78d3 Skyline PRM Orbitrap 8104ca2642cf8ecabf15e7186111df943dc8c849 Skyline DIA PASEF 3b60f190815b26f529e023dea31ff6dbe9d07780 Skyline Small Molecule Quantification 4877dbe85c1de6dc26d818b2137fe4c96dc4e75c Skyline Data Independent Acquisition 68a134615daeb450fb29d671782374650d5ff323 Skyline Targeted Method Refinement a3b134e3e7c8617af148aeb3974eb3be9b568737 Skyline Importing Assay Libraries 0fb22407cc7faab33c704731a7db752078629571 Skyline Spectral Library Explorer a62d3b9e2f30ecd1dcc8036d72702d02fe767214 Skyline DIA Umpire TTOF 4a997f9c3966bf9c7667c02c38edd05da7a866b5 Skyline DIA TTOF b8b47c52efd585dd3fd8fb52922f43101a493bac Skyline DIA QE 8757a6c62691dbc89b95161346c6d94ff559dfe0 Skyline PRM cc52100caa6a4796c0977e973078a4adbb0a877a Skyline Absolute Quantification 3188293e25e07933243113b98479be83b68fc130 Skyline Small Molecule Multidimensional Spectral Libraries f5cf39c2b3cd34a2fcea779a668b709529f51122 Skyline Existing and Quantitative Experiments bf87ef497e5574bdcf8b02e7f1f0db0a1782a717 Skyline PRM Orbitrap-PRBB-format 739b3f145be7c2522f5e8355a896cb682388c164 Skyline Targeted Method Editing a740ee32878b90ad609dc29a1973b0dd376c0ae5 Skyline Small Molecule Targets 69352504b87498c1469d358ba013756b2ee2663e Skyline Advanced Peak Picking a32c07dc6e4d3fbd7ff04da9808f423212ad670e Skyline Hi-Res Metabolomics ec4410a30500c17c612b82a7a1a3c8aa4359d038 Skyline Small Molecule Method Dev and CE Opt 63013d4f4055f28a1c85ccf61712a9ad11b1d0f6 Skyline Importing Integration Boundaries 8674c66bc00a542b3a1f5bef0716352710bf9026 Skyline Ion Mobility Spectrum Filtering 65bc2efa86507d6d77de9f036f0983d23fa09055 Skyline Collision Energy Optimization c946b3179c1fd773723963a68e8789e3c9d1afe8 Skyline iRT Retention Time Prediction 4ead0b5761d769cdbffa2d54ed0957ce59df6a2d Skyline Processing Grouped Study Data e4d76f3db687fc1540e18e48cdd4dea27b833db6 Skyline Custom Reports b9c90b2a88d55810245abfeff416f7723435bedd Skyline MS1 DDA Search ca6799ed1e5342897397700d494c81a148ef895f Skyline Targeted MSMS c095be0054725665145bde0394465d0419a23cc2 Skyline Audit Logging 7e677b69a22110983e68318c04faa1b1a0065ce2

@brendanx67 I undid the delete of the docx files and moved them to the new folder we create. Also, I know we have metrics for lines of code, do you know if we have to do some exclusions for these new files?

eduardo-proteinms commented 1 month ago

Posted a new set of commits with each document after some changes:

shared ea17d3bf3462043a29155058f944fc25097e0453 Skyline Small Molecule Multidimensional Spectral Libraries cbcf2c81fb03e43a6e71a9c6d2611401df78f379 Skyline Existing and Quantitative Experiments 7b3dfc88cdbc4489c124d4d41e9c9e1eba4d39fe Skyline Small Molecule Method Dev and CE Opt d1ed50d2611d130690dd1aa32b48dff896b20eef Skyline Importing Integration Boundaries 636f597ece80100eedc0f9d503031cd6b9caf312 Skyline Ion Mobility Spectrum Filtering ed642ca1af1be80f656d8232f5c19e1c502146df Skyline Collision Energy Optimization 6664f7b55dc3803d8cc855f73e4b58033ebdae2d Skyline iRT Retention Time Prediction b2a954a318fd334184ce6e5f11d5796778fc4ce9 Skyline Processing Grouped Study Data 302aeb3eb07b927b9b66ca971e4f240d4e28304b Skyline Small Molecule Quantification 15b67503c2f1b1b197479c8853e36b21f72746f6 Skyline Data Independent Acquisition 8da3727213fcd60a3a1f54298ac950792fd4e5ce Skyline Targeted Method Refinement e4d44e5ab976f356d3804df9066f9327dae6348e Skyline Importing Assay Libraries 823c0c5fafc71d3f6fded3dde0fad1ca1fde2bdc Skyline Spectral Library Explorer 89d20dde4b095cedf85fd04456c52f31a2ad9117 Skyline PRM Orbitrap-PRBB-format 48aeb01c28cf6e6f9fb73d5e2ea634353d2f5232 Skyline Absolute Quantification 5f1d201d9f10278c1c64399c185d24df1b67f1f3 Skyline Targeted Method Editing 1989e900e88ec5d147bad9ea9571403c81a853ac Skyline Small Molecule Targets 5d747bf74811edec622d3c8ebb214b75eae6dd65 Skyline Advanced Peak Picking ae9d1fbbac44687d878ea04d2bf5929b517d85fa Skyline Hi-Res Metabolomics 54e36d54c791a35606fff823d7b67b197f8a6af9 Skyline DIA Umpire TTOF b3b3d64016334a642f35419dcdf4e7299f5a0b18 Skyline Custom Reports 0da87dd5a797fe35eeb8e2e9af2b958b5e89a6f1 Skyline MS1 DDA Search 75d812a2a46f244244b65f79bbf1ce4273561f23 Skyline Audit Logging 9cbecdf3708c43ab815188d91b7447f7bbe2476a Skyline MS1 Filtering f44e400cf729dd2e09c287f2c1a0108d29a8d2c9 Skyline Targeted MSMS 8950c62460d589c9eb7dd617afc24ef4cbba1532 Skyline PRM Orbitrap 4c027166f7f734bf5c2c944bad89d0c5d5be798d Skyline DIA PASEF df7cbbc658b4201207c3c5df9218382d22d1bc88 Skyline DIA TTOF b1941bc1cc23843f592795c26febcf88729e4579 Skyline DIA QE 244d8d65a57548c92561a526d66a9d1a51542d6d Skyline PRM f3b1310cddc7d51672926fb88921adfa16140b18 Remove old documents 24af8bf9c489f6942e4de77e0d5a7afae7774356

eduardo-proteinms commented 1 month ago

changed all documents to use locale folder and updated pathing

Shared Images d9e307c7b87b6c2d9043b05fcc98d7c310564074 Skyline Small Molecule Multidimensional Spectral Libraries e659d9b8874369388bdf35612377e4d69c50e7ff Skyline Existing and Quantitative Experiments cd06f575b40195fb576a786070d118b816258d78 Skyline Small Molecule Method Dev and CE Opt df4661f3295e1e8bad83c057153765e6cd85ea8c Skyline Importing Integration Boundaries f5bb62f52faeba3ce04209b330067b5c23a3ac16 Skyline Ion Mobility Spectrum Filtering 56356736f7ab60c9e9b04574150a15c783cdf245 Skyline Collision Energy Optimization a4ceb4adfd7dc8793a057fa6d4298667af4b2bb7 Skyline iRT Retention Time Prediction 639c616b531602dc39852d1ae68402dbd4e8213a Skyline Processing Grouped Study Data ad71cb63f03487bf834980c85f2baab4374c021f Skyline Small Molecule Quantification 24f22b2012a0ae8708597f0470f4c9ee9bb73155 Skyline Data Independent Acquisition 82b94201b1bc49e898f85ee1612c193fcfc4a3fa Skyline Targeted Method Refinement 6de42e62ede62fc19106bf5d1f92b6aedb1a3788 Skyline Importing Assay Libraries 26ebbd664aefb7ab5b72fe21cad0dc3095491f76 Skyline Spectral Library Explorer 9c6be1ee23e02fb09e2c336328b96079dfa699dd Skyline PRM Orbitrap-PRBB-format 71905e39086db00ce9a30b22f1a30ee1f1151435 Skyline Absolute Quantification b810bef160c27185e125d9155f02aa249a8b0cf6 Skyline Targeted Method Editing 25ef67b4b46dbc30874913261bff7c6a7a283a56 Skyline Small Molecule Targets 9f0c054feb15f8bc38e2be9c71e6ece77ab625da Skyline Advanced Peak Picking c003c4d557f1d0f13e5edc59b795516a2de09b86 Skyline Hi-Res Metabolomics 41710571d45cf711a0180b5a466f0a832a3555c5 Skyline DIA Umpire TTOF c944eda722f79fd90ead009cb2e274b306aa718b Skyline Custom Reports d6b1dc2eac199a0662f0e50d65e79b44d2551315 Skyline MS1 DDA Search a1723827fac54079069c254ae37c231dec6dade6 Skyline Audit Logging 078199ffed84b60d1ea6f45b1a0f79e64d071f96 Skyline MS1 Filtering 298eed08b3a00d05e80476f2303aaf059b1286da Skyline Targeted MSMS 7e88ebe709595adaf538e378be01adcc032c453f Skyline PRM Orbitrap c0d52be063fb5722d6e7c978ed41540cfb873790 Skyline DIA PASEF a6e09bc7a071394869957535e13b4d7dbc0d38ad Skyline DIA TTOF cbec81709b124bc36ce55feeff672c950803b417 Skyline DIA QE a5445d4eef8b61d797e7dc414c833333969e9a82 Skyline PRM 3a8277b4f97c22c2ab545ba6409d2637ce134243 Remove docx files 0f3bbccae5e6f5eb062352ba244978986455d763

eduardo-proteinms commented 1 month ago

I have updated all the docx tutorial files so that anything that required a manual fix has been simplified to something the conversion support.

This latest revision has the following fixes:

I have created this document to help me keep track of any issues we find: https://docs.google.com/document/d/1JzW7b6SeG9MK4UsLcVwgC64lgLlhEd4kQjrQaCDPdLY/edit?usp=sharing

This PR should be ready to merge as a good starting point for HTML tutorials. Any contributions to the validations are welcome still as the conversion is still fully automated.

Here is the split up commits if you want to take a look at single conversions: shared d0d0ca4734cba1b1f9177cd8e30aecd49b344d75 ImportingIntegrationBoundaries a80ec8a6855095b51e2cd1d2208daf1d219c7db9 SmallMoleculeMethodDevCEOpt f1a19d255c56ca2fda5f217091497e5576e131f5 SmallMoleculeQuantification 9afeda4ac007d017cd5ec0f0919a519149fe95af SmallMoleculeIMSLibraries cc4a92c142b8b77e81ed0d11a07e67d7f8c8c4a8 ImportingAssayLibraries cd5778859d5f90c55cff841f3be2701173250499 HiResMetabolomics 1d196c905215c3d083f23f2e3555799181852910 PRMOrbitrap-PRBB cb9871b88ba3de0d22d75a9bb59f2f4f11acf452 DIA-Umpire-TTOF 46d3378dbe9e9c7c79f93d47bf221db1c9dfcdd6 LibraryExplorer d4972416f1a6c6d58b49bf6bd6d0ac2ee61e5e7b GroupedStudies 10e2bf064e19ecdf5f3a046804565e37eaff25cf AbsoluteQuant 596822de6c866e66aa8efc53d933a32f63d07b22 CustomReports c9eaa685d0077d3f6ec4b10ba1280f62aa113d88 ExistingQuant f9ce1129c0230da17cee89379dfa86dfbab7944d SmallMolecule e1f428365004a94df1edf6a17869a72989559160 IMSFiltering 521490d3bf49a52e2832e7402d8a7fd67c33b6da MethodRefine 246b6817570a929958858becc02ece591eb30c73 MS1Filtering 29cb4c0cdfc33b7a490adcc3aa244828a2729275 PeakPicking 7458cb8788253fa22f54c59a870a877a6d43dd08 PRMOrbitrap e14aef10c52e211db5d04395da9b16ffedaa8e02 MethodEdit 82e6bdf39a511621c67e7630832507f5ec43fa13 OptimizeCE f9d9fdc5669cb0c1f57023fb896d95f9ab767aff DDASearch c73a38f0752483c8fc5b5c7a45f266df06424520 DIA-PASEF 63b2c009bb62e19e299720d8191468707ede07d5 AuditLog 52a9b49240556614f4a29203ed4ca24b01390fa0 DIA-TTOF 6ac389c71a9662c4fa6c694b77e41c9f906841ae DIA-QE 2f6a8c3ca82221c739fe828c88f2c1d0fe0b34dd DIA eb6f270056b894081efc02e1c6645557f97b198f iRT 81f2d2cafdde0b01bd63e6316ba2d2b7b31bf4b8 PRM 09431504b98ef1c16450215ad5eaec6afb13868f Deleted existing files 49c7accb17b0b43cb99ba45c6830965047c43ebd Update stylesheet 16ba0d2fb76380d8a91ec774de82aa5492b3cba8

eduardo-proteinms commented 1 month ago

Posted latest document updates which includes alot of fixes/changes we've discussed offline. Latest includes invariant html documents.

shared f734a5e6ab0d708155564b0ce53d911a0c30549e ImportingIntegrationBoundaries b7554a26cab5d0edd7b7afc5e1ec56d7ff54e7d1 SmallMoleculeMethodDevCEOpt 8c9a1da36fa22890868c6215f627ee98f15958d9 SmallMoleculeQuantification 49d6c3493d0a4603ec5204b2c1df8b57a1e7a9dc SmallMoleculeIMSLibraries e1a64857e5a26cd125e16e89619669632e837d7e ImportingAssayLibraries 86f374f05803cf22d030613b959d269dcd69b6b1 HiResMetabolomics 83ada4c069db743e4cfd4575f43cb19812ca0ac1 PRMOrbitrap-PRBB 5ecef444d1d53fa7fba2afce8430cce7cb5f9ddd DIA-Umpire-TTOF ceeb1e6f0be3380cd027a18a8192e755979b9bf0 LibraryExplorer 42d113ca5d04e96d073a8486f9605e7460540944 GroupedStudies 9ad1281ac256e62df0892de53da553e99a24a247 AbsoluteQuant c08abff802623e0ea8a3ca8e1a9408a51e41331f CustomReports 54350fa841bcffaf61139da6688341c1a3b5251e ExistingQuant 9586fb67bc6955786344ffa7da788b80866168c6 SmallMolecule bf0744b60f3ba76a3bd1a92a8618828ffdde3b18 IMSFiltering f00146871e303e74fdbecd19e76b585673676ef1 MethodRefine 1ccbe42651cbebba8ed187c7da719c8e33ec308d MS1Filtering 57f71dc7a85bc35c2db2b87b8a94f8991e9e2a64 PeakPicking 41511d5df6625f4a9c2da43fb168ff95df48f951 PRMOrbitrap 69e30dfae11435bed1b9e0212d56d412648846f0 MethodEdit e675298fb7c871f8e451857662019438b97c355d OptimizeCE 164e7c64b3f70e3fc294610f313633dd159259ae DDASearch 1b2e86ef6a76aaf43b211bb0b6189db8e917e1f5 DIA-PASEF b0ef64142c80050450274e4a63ed673a185cf0d1 AuditLog 114e4b744dccb3ef7d661ad9d9d5b308f4a217fb DIA-TTOF d70487092ef622cfa390e73cb828e2a0f3830805 DIA-QE 5a9e4000ff2d56a2ff6c8789be69437a1671fdb8 DIA c4e4a71fdf4af0cb3990210a993d237cb1c33395 iRT 6ee2af841cfa17e5eb1d76454bf0d64972f8de23 PRM 7b014aac9f126227755aaf9e26fe63d540302188 Remove old documents 3dec01b4539022c87ea85f6b734a8b337cffb951