dglazkov / polymath

MIT License
133 stars 9 forks source link

Google Docs and Google Slides importer #39

Open jkomoros opened 1 year ago

jkomoros commented 1 year ago

Ideally it would be possible to enumerate some Google Docs and Google Slides you own and have it import the content.

I'd love to have for example https://komoroske.com/gardening-platforms and https://komoroske.com/slime-mold in it.

For slides, it can just select any text runs and also speaker notes.

For docs it should be straightforward.

jkomoros commented 1 year ago

I asked GPT what the code should be:

function extractTextFromSlides() {
  var presentation = SlidesApp.getActivePresentation();
  var slides = presentation.getSlides();
  var text = "";

  for (var i = 0; i < slides.length; i++) {
    var slide = slides[i];
    var elements = slide.getPageElements();

    for (var j = 0; j < elements.length; j++) {
      var element = elements[j];

      if (element.getPageElementType() == SlidesApp.PageElementType.SHAPE) {
        var shape = element.asShape();
        var shapeText = shape.getText();

        if (shapeText) {
          var paragraphs = shapeText.getParagraphs();

          for (var k = 0; k < paragraphs.length; k++) {
            var paragraph = paragraphs[k];
            var runs = paragraph.getRuns();

            for (var l = 0; l < runs.length; l++) {
              var run = runs[l];
              text += run.getText();
            }
          }
        }
      } else if (element.getPageElementType() == SlidesApp.PageElementType.TABLE) {
        var table = element.asTable();
        var rows = table.getRows();

        for (var k = 0; k < rows.length; k++) {
          var row = rows[k];
          var cells = row.getCells();

          for (var l = 0; l < cells.length; l++) {
            var cell = cells[l];
            var cellText = cell.getText();

            if (cellText) {
              var paragraphs = cellText.getParagraphs();

              for (var m = 0; m < paragraphs.length; m++) {
                var paragraph = paragraphs[m];
                var runs = paragraph.getRuns();

                for (var n = 0; n < runs.length; n++) {
                  var run = runs[n];
                  text += run.getText();
                }
              }
            }
          }
        }
      }
    }
    var speakerNotes = slide.getNotesPage().getSpeakerNotesShape().getText();
    if(speakerNotes)
      text += speakerNotes.getText()
  }
  Logger.log(text);
}
dglazkov commented 1 year ago

Ain't bad.

dglazkov commented 1 year ago

I wonder if this might be a good approach: https://developers.google.com/docs/api/samples/extract-text#python

jkomoros commented 1 year ago