dvargas92495 / SmartBlocks

Useful examples from developer community for Roam42 SmartBlocks
147 stars 7 forks source link

Get full page text & simple term frequencies, POC for future NLP extensions #156

Open dnlmc opened 3 years ago

dnlmc commented 3 years ago

✂️ Copy of your #42SmartBlock from Roam

// get all blocks on current page by div class. can probably be refined var allBlocks = document.querySelectorAll('.rm-blockinput.rm-blockinput--view.roam-block.dont-unfocus-block.hoverparent.rm-block-text');

// turn list into array in order to use .map() next (i think?) var blockArray = [].slice.call(allBlocks);

// collapse innerText from all block divs into single string var allText = blockArray.map(function(e){ return e.innerText; }).join(' ');

// split each word from string, creating array of words, // could also apply various normalization functions here // e.g. force lowercase, handle special characters, etc var wordArray = allText.split(' ');

// loop to count word frequencies & store in object, found online var wordCounter = {};

for (var i = 0; i < wordArray.length; i++) { if (wordCounter[wordArray[i]]) { wordCounter[wordArray[i]] += 1; } else { wordCounter[wordArray[i]] = 1; } };

// function to sort by word frequency, also found var wordArraySortFunction = function(word1, word2){ if(wordCounter[word1] < wordCounter[word2]){ return -1; }else if(wordCounter[word1] == wordCounter[word2]){ return 0; }else if(wordCounter[word1] > wordCounter[word2]){ return 1; } };

// apply sort & reverse to get descending order wordArray.sort(wordArraySortFunction).reverse();

// create array of strings for each word & its frquency var freqTable = [];

for(var i=0; i<wordArray.length; i++){ freqTable[i] = wordArray[i] + ': ' + wordCounter[wordArray[i]];
};

// return unqiue word frequencies each on own line. // uses verbose Array ...Set syntax to deduplicate, can prob improve return Array.from(new Set(freqTable)).join('\n')```%>



## 📋 Describe the SmartBlock
<!-- Short and concise description of how the SmartBlock works and its purpose -->
Proof of concept to get full text from all blocks on a page, then produce simple term frequency counts. Partly inspired by Tiago's question during he & Connor's [Peace Summit](https://youtu.be/-Aqg9Z5gWNg?t=1231). Meant as initial foray into further NLP applications, likely by importing a [JS NLP library](https://www.kommunicate.io/blog/nlp-libraries-node-javascript/), as demonstrated here: https://github.com/roamhacker/SmartBlocks/issues/127. Probably won't have much time myself so happy for others to run with it! 

Obvious refinements:
1. Preprocessing / cleaning / stemming / remove stop words
2. Only return top N words or counts > N
3. Enable ngrams as opposed only single words (unigrams)
4. Provide page or block reference to process, rather than current page(s) / blocks in view
5. Maybe option to exclude linked / unlinked references
6. Return as actual table?
7. Import [JS NLP library](https://www.kommunicate.io/blog/nlp-libraries-node-javascript/) for deeper functionality

## ✅ Describe any prerequisites or dependencies that are required for this SmartBlock
<!-- List any required roam/js extensions, roam/css, other SmartBlocks etc. -->
Just Roam42

## 📷 Screenshot of your #42SmartBlock workflow/template from Roam
<!-- To ensure other users setup correctly, please provide a screenshot of your #42SmartBlock in Roam -->
<img width="383" alt="Screen Shot 2020-12-27 at 5 56 34 PM" src="https://user-images.githubusercontent.com/18430230/103181218-f17b7580-486c-11eb-8ef5-d0fe2166d64b.png">

## 💡 Additional Info
<!-- Add any other context, info, or screenshots/GIFs to help other users with this SmartBlock -->
![ezgif com-video-to-gif](https://user-images.githubusercontent.com/18430230/103179752-2da6da00-485d-11eb-80c9-96c7934c3288.gif)

<img width="593" alt="Screen Shot 2020-12-27 at 4 00 45 PM" src="https://user-images.githubusercontent.com/18430230/103179733-f0dae300-485c-11eb-99cf-a934b9770ce7.png">