bettermusic / ChordSheetJS

A JavaScript library for parsing and formatting ChordPro chord sheets
GNU General Public License v2.0
4 stars 1 forks source link

💡 RFC: PDF Formatter #186

Closed isaiahdahl closed 4 months ago

isaiahdahl commented 2 years ago

Background & Motivation

Generate PDF's and ideally do it browser side.

If we can figure out how to do this well browser side the library will be soo much easier to maintain in the open source world.

The challenge with browser side is teh only HTML to PDF generators I can see in Node just create a Canvas and don't create PDF's with text content in them which I think is valueable.

We need to weigh the complexities of these tech requirements:

Proposed Solution

Possible Solutions

  1. Integration with PDFKit
    • Description: PDFKit is a javascript package that allows you to create PDF's uses pdf.js under the hood. Example:
      
      const PDFDocument = require('pdfkit');
      const fs = require('fs');

// Create a document const doc = new PDFDocument();

// Pipe its output somewhere, like to a file or HTTP response // See below for browser usage doc.pipe(fs.createWriteStream('output.pdf'));

// Embed a font, set the font size, and render some text doc .font('fonts/PalatinoBold.ttf') .fontSize(25) .text('Some text with an embedded font!', 100, 100);

// Add an image, constrain it to a given size, and center it vertically and horizontally doc.image('path/to/image.png', { fit: [250, 300], align: 'center', valign: 'center' });

// Add another page doc .addPage() .fontSize(25) .text('Here is some vector graphics...', 100, 100);

// Draw a triangle doc .save() .moveTo(100, 150) .lineTo(100, 250) .lineTo(200, 250) .fill('#FF3300');

// Apply some transforms and render an SVG path with the 'even-odd' fill rule doc .scale(0.6) .translate(470, -380) .path('M 250,75 L 323,301 131,161 369,161 177,301 z') .fill('red', 'even-odd') .restore();

// Add some text with annotations doc .addPage() .fillColor('blue') .text('Here is a link!', 100, 100) .underline(100, 100, 160, 27, { color: '#0000FF' }) .link(100, 100, 160, 27, 'http://google.com/');

// Finalize PDF file doc.end();



   - **Integration Plan**: 
     - Create a PDF from a Song 
     - Every Change would re generate a new PDF and stream it somewhere
     - Formatter would have a configuration to handle all of the configurable elements.
     - Not sure how custom linebreaking logic fits into this approach 

3. **Leveraging Web Assembly with pdfiumf**
   - **Description**:  Pdfium, the C++ PDF library used by the Google Chromium project Pdfium can render pages in PDF files to bitmaps, load, edit, and extract text and images from existing PDF files, and create new PDF files from scratch.
   - Seems like this package can be compiled to wasm and used for rendering and creating? so possibly something that has benfit in the studio library as well for actually rendering the PDF's
   - I don't really have any experience and this would require more trial and error to make work.
   - Would be faster

2. **Leveraging Web Assembly with wasm-pdf**
   - **Description**: wasm-pdf is a promising Web Assembly package that can potentially offer faster PDF generation due to Web Assembly's performance benefits. This option involves using the wasm-pdf package to generate PDF documents from chord sheets.
   - The package is no longer maintained and seemed to be a bit of a pssion project, but the fundamentals are there.
   - I think the only reason we would consider this over that Javascript package is if we thought the speed of pdfkit couldn't keep up with how much generating we have to do. 

### Alternatives Considered

1. **HTML to PDF Conversion**
   - Utilizing existing HTML formatters in ChordSheetJS, we can consider converting HTML output to PDF. This method might involve lesser development effort but can pose challenges in achieving precise layouts and styles. 
   - HTml to PDF packages that I have explored all sit on top of pdf.js and just create a canvas and then turn that into an image PDF. which isn't really ideal. 

### Risks, Downsides, and/or Tradeoffs

- **PDFKit Integration**: 
  - Proven technology with rich feature set.

- **wasm-pdf Integration**: 
  - Potential for better performance due to Web Assembly's efficiency.
  - Less mature and not well supported, which could lead to potential issues in the future.

- **Other WASM Packages I havn't explored**
  -  https://crates.io/crates/pdfium-render

- **HTML to PDF Conversion**: 
  - Might not offer precise control over layouts and styles.
  - Potential issues with complex chord sheets.
  - Not really a true HTML to PDF generation, uses Canvas and "images" of the HTML

### Open Questions

1. Will PDFkit be fast enough for what we'll need it to do. 
isaiahdahl commented 11 months ago

@martijnversluis Updated this RFC with more thoughts and detail of what I've accumulated so far.

isaiahdahl commented 11 months ago

Interesting feature from pdfkit https://pdfkit.org/docs/text.html that could help with linebreaking logic!

Text measurements If you're working with documents that require precise layout, you may need to know the size of a piece of text. PDFKit has two methods to achieve this: widthOfString(text, options) and heightOfString(text, options). Both methods use the same options described in the Text styling section, and take into account the eventual line wrapping.

isaiahdahl commented 11 months ago

The more I actually look at how pdfkit could work for rendering a chord chart the less I think it'll work. Seems like it would be quite hard to get chord lyric pairs to match up with eachother when using non-monospace fonts.

isaiahdahl commented 11 months ago

@martijnversluis I played around with the wasm-pdf library a bit more and made a fork to get a better idea of how the JSON to PDF works.

If you clone it and get it running locally you can see how fast it takes the JSON and generates a PDF from it.

Obviously it's all rust code and isn't test covered or really that clean, but it's worth exploring.

https://github.com/bettermusic/wasm-pdf

image
martijnversluis commented 9 months ago

@isaiahdahl I'm currently trying wasm-pdf, seeing what it would look like to render a song with it.

isaiahdahl commented 9 months ago

@martijnversluis I'm open to hiring a rust developer to implement some requirements if we can see a path forward.

My thoughts for line & page breaking logic is that we'd have to have supported fonts registered with like a config, where each font character has some width/height properties to it that we can use to calculate widths/heights of lines.

This is already in line with how it's working I think.

So if we were able to generate a JSON model specific for the wasm-pdf library and then the wasm library was extended to render things exactly like we need.

The barrier I saw when playing around with it was getting the chord above the lyric to properly push the next lyric over so that the full chord finishes before the next lyric starts.

This could be done with table logic except for the rendering logic would have to adjust so that the table rows don't try and consume the full available width of the container it's allotted.

in the canvas.rs file there's a function draw_table_row that currently calculates the cell_width by getting the full available width and dividing it by the number of columns.

If that logic was adjusted so that it was just left justified possibly and didn't care if each row had the same number of columns maybe there is something there.

martijnversluis commented 9 months ago

That sounds great! I will continue my experiment, that will show what the library is lacking.

martijnversluis commented 9 months ago

So far I have been able to render an ok chart. I have to find a way to turn off the 100% width for tables.

EDIT: I don't think turning of 100% stretch for tables is possible, looks like it's hardcoded.

image

isaiahdahl commented 9 months ago

I mean it's a start! What are your thoughts so far on this direction?

Yea I came to that conclusion as well when followign the code. That's the area that would have to be re-written to fork off from just rendering a table and instead allow for rows to be left justified, and each row could have a different number of columns. Then when it goes to calculate the cell_width, it's calculating the cell_width based on the contents of the cell, not the row / columns

Do you want to maybe setup a quick call to review some thoughts and talk it through?

martijnversluis commented 9 months ago

So far this worked well. All fonts etc are customizable which would be quite a "must" from a Chordpro perspective. If this would be our pick we would probably need a rust developer to get the package to a higher level and change it according to our needs.

That being said, this week I did some testing with jsPDF. So far the result is comparable with that of wasm-pdf. See this Codepen. I have yet to discover if I can better customize the table rendering.

image
isaiahdahl commented 9 months ago

Can we schedule a meeting to review and discuss a plan forward?

isaiahdahl commented 9 months ago

The jsPDF looks interesting forsure, seeing how it has functions for line width and text measuring capabilities. Tons of potential here. Nice find!

martijnversluis commented 9 months ago

Can we schedule a meeting to review and discuss a plan forward?

Yes, seems like a good idea. I'll check your calendar to schedule something.

isaiahdahl commented 7 months ago

image [Kingdom (Kirk Frankli...) Chord Chart - F - 2 Column Layout.pdf](https://github.com/bettermusic/ChordSheetJS/files/13960347/Kingdom.Kir Kingdom (Kirk Frankli...) Chord Chart - F - 1 Column Layout.pdf k.Frankli.Chord.Chart.-.F.-.2.Column.Layout.pdf)

isaiahdahl commented 7 months ago

Next Steps

Next Goal is to get a PDF rendering that for the most part mirrors what PraiseCharts can currently do

isaiahdahl commented 7 months ago

hey @martijnversluis so I messed around today to try and get a pdf rendering dev workflow working and ended up getting something.

image

the branch in this repo is feat/pdf-renderer

basically you just need to yarn install and then open two terminal windows and run yarn dev for the lib watching and yarn dev:pdf which serves the simple display at http://localhost:3000

Changes made to the formatter rebuild the webview and it's got a simple codemirror for interactive chordpro.

Maybe you already started and got something else working but figured I'd at least share!

isaiahdahl commented 7 months ago

@martijnversluis Well I couldn't help myself this evening and took a stab at doing the approach where the placement of the text is calculated using that getTextDimensions function. Got pretty far even with preliminary support for going to multiple pages and multi columns.

image

Pushed my code to that feat/pdf-renderer branch I started.

Put in a performance measuring timestamp as well and what's pretty awesome is that a big long chart like this one renders the PDF in ~0.03s. That's with all the individual words and chords having their width calculated. Pretty stoked about this path and can see how with some refining the configuration could make literally everything from placement of title, artists etc... driven from the config.

Let me know your thoughts!

martijnversluis commented 7 months ago

Wow, that's a great result! Tbh, I didn't start with anything yet, so I'm happy there's already a working base!

I created a PR so we can iterate on it. There are a few areas where we can improve/simplify by using existing helpers, such as interpolating metadata.

martijnversluis commented 6 months ago

The header/footer templates, is how the current PDF generation is configured?

template: 'Key of ${key} - BPM ${bpm} - Time ${timeSignature}'

We might want to consider using the chordpro interpolation syntax instead to make rendering simpler:

template: 'Key of %{key} - BPM %{tempo} - Time %{time}',
martijnversluis commented 6 months ago

Thing about it, it might also just be an function using a template literal:

template: song => `Key of ${song.key} - BPM ${song.bpm} - Time ${song.time}`
martijnversluis commented 6 months ago

@isaiahdahl I'm wondering, on what did you base the structure of the formatter configuration? So far I've tried to abide by the default Chordpro configuration, what do you think?

isaiahdahl commented 6 months ago

The header/footer templates, is how the current PDF generation is configured?

template: 'Key of ${key} - BPM ${bpm} - Time ${timeSignature}'

We might want to consider using the chordpro interpolation syntax instead to make rendering simpler:

template: 'Key of %{key} - BPM %{tempo} - Time %{time}',

Yes this makes perfect sense. I just happen to be in a php/angular mode when writing that the first time but it makes sense to match the chordpro interpolation syntax.

isaiahdahl commented 6 months ago

@isaiahdahl I'm wondering, on what did you base the structure of the formatter configuration? So far I've tried to abide by the default Chordpro configuration, what do you think?

I was operating under more of a MVP/POC mindset when making it work (just throwing stuff in the config in a somewhat thought through way and then plan to refactor it once all the configurable stuff is in there), but also with the assumption that we wouldn't be trying to match the chordpro configuration from their implementation exactly and that we would be creating our own.

I sorta assumed that their configuration would have too many things specific to their way of doing things, and so was thining of it more as inspiration compared to trying to stick to it.

But in saying that I didn't have a super close look through the entirety of that configuration.

After giving it a closer look, I still feel like trying to match the configuration that works for their program will be limiting, though I think there are sections of it where it makes sense to not "re-invent the wheel" and use a configuration/object like they have it defined.

For example, the concept of the "layout" object I had made seems like a much more flexible and beneficial improvement on what's possible within their config.

thoughts?

isaiahdahl commented 5 months ago

Consider one unified configuration for all formatters.