brunoamaral / gregory-ai

Artificial Intelligence and Machine Learning to help find scientific research and filter relevant content
https://gregory-ai.com/
Other
47 stars 7 forks source link

Refactor db maintenance and improve the feedreader #262

Closed brunoamaral closed 2 years ago

brunoamaral commented 2 years ago

Currently the db_maintenance tasks take over 10 minutes. The goal of this issue is to optimize the script by moving some bits into functions shared with feedreader.

An idea for the future: Add an API endpoint to be used by node-red to include new articles in the database, making them run through the same pipeline of cleaning the abstracts, URLs, and fetching information from crossref.

Discussed in https://github.com/brunoamaral/gregory/discussions/260

Originally posted by **brunoamaral** October 16, 2022 Journal Article Tag Suite (JATS) is a specification for structuring data in science papers. Right now, Gregory MS has 3,316 articles with JATS tags from a total of 14,573. This wouldn't be an issue if feeds like PubMed gave out the full abstract. Querying crossref.org like we usually do with the DOI number means we will get the full abstract but with JATS tags structuring that string. An example ```html Purpose This exploratory study sought to identify acoustic variables explaining rate-related variation in intelligibility for speakers with dysarthria secondary to multiple sclerosis. Method Seven speakers with dysarthria due to multiple sclerosis produced the same set of Harvard sentences at habitual and slow rates. Speakers were selected from a larger corpus on the basis of rate-related intelligibility characteristics. Four speakers demonstrated improved intelligibility and three speakers demonstrated reduced intelligibility when rate was slowed. A speech analysis resynthesis paradigm termed hybridization was used to create stimuli in which segmental (i.e., short-term spectral) and suprasegmental variables (i.e., sentence-level fundamental frequency, energy characteristics, and duration) of sentences produced at the slow rate were donated individually or in combination to habitually produced sentences. Online crowdsourced orthographic transcription was used to quantify intelligibility for six hybridized sentence types and the original habitual and slow productions. Results Sentence duration alone was not a contributing factor to improved intelligibility associated with slowed rate. Speakers whose intelligibility improved with slowed rate showed higher intelligibility scores for duration spectrum hybrids and energy hybrids compared to the original habitual rate sentences, suggesting these acoustic cues contributed to improved intelligibility for sentences produced with a slowed rate. Energy contour characteristics were also found to play a role in intelligibility losses for speakers with decreased intelligibility at slowed rate. The relative contribution of speech acoustic variables to intelligibility gains and losses varied considerably between speakers. Conclusions Hybridization can be used to identify acoustic correlates of intelligibility variation associated with slowed rate. This approach has further elucidated speaker-specific and individualized speech production adjustments when slowing rate. ``` Options available: 1. Keep the tags and let the end user decide how to proceed 2. Translate the tags to standard html I like option 1 because it gives more information to the user. The downside is that the browser doesn't parse these tags. I feel that the correct way to move forward would be to keep the tags and provide information on how to style them on html, which at the moment is outside of my ability.
brunoamaral commented 2 years ago

fixed in #270