OutlierVentures / QTM-Interface

GNU General Public License v3.0
29 stars 18 forks source link

Retrieving Data from Subprocess #37

Closed achimstruve closed 1 year ago

achimstruve commented 1 year ago

Who knows how we can retrieve the post processed data from a subprocess ran script?

In line 45 of ./Model/interface.py we call the subprocess to run the simulation script. image

However, how can we for example access the data returned by the postprocessing function in row 68 ./Model/simulation.py inside of the interface.py script to have it available for future plots on the website? image

achimstruve commented 1 year ago

My colleague @Scott-Canning mentioned that we could also leverage a SQLlite3 database as it is a straight forward approach and I am quite confident that it is compatible with pandas data frames.

https://docs.python.org/3/library/sqlite3.html

cc.: @BlockBoy32

Scott-Canning commented 1 year ago

Sharing a simple example for creating a sqlite3 database file (independent script file to be run once or for overwriting) and an insertion function:

import sqlite3

# Connect to the database
conn = sqlite3.connect('tweets_auto.db')

# Create a table to store JSON data
cursor = conn.cursor()

# Create table
cursor.execute('''CREATE TABLE tweets
                  (id TEXT PRIMARY KEY,
                   author_id TEXT,
                   username TEXT,
                   created_at TEXT,
                   impression_count INTEGER,
                   like_count INTEGER,
                   quote_count INTEGER,
                   reply_count INTEGER,
                   retweet_count INTEGER,
                   text TEXT)''')

# Commit the changes and close the connection
conn.commit()
conn.close()`
def load_user_tweets(user_tweets):
    # Connect to the database
    conn = sqlite3.connect('tweets_auto.db')
    cursor = conn.cursor()

    # Pull username
    username = user_tweets['includes']['users'][0]['username']

    # Insert data
    for tweet in user_tweets['data']:
        try:
            cursor.execute("INSERT INTO tweets VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
                (
                    tweet['id'], 
                    tweet['author_id'],
                    username,
                    tweet['created_at'],
                    tweet['public_metrics']['impression_count'],
                    tweet['public_metrics']['like_count'],
                    tweet['public_metrics']['quote_count'],
                    tweet['public_metrics']['reply_count'],
                    tweet['public_metrics']['retweet_count'],
                    tweet['text']
                )
            )

        except Exception as error:
            # Handle any errors that occur during data insertion
            print(f"Error upon load: {error}")
            conn.rollback()

        else:
            # Commit the changes to the database if no errors occurred
            conn.commit()

    # Close the database connection
    conn.close()
BlockBoy32 commented 1 year ago

@Scott-Canning Thanks for the advice, @achimstruve I implemented a POC with all of the data flowing properly. It seems to still be very slow on the interface but I cannot pin why. I can dig into it more when I have the cycles, cheers.

BlockBoy32 commented 1 year ago

Also @achimstruve I am going to leave this open for now because while we solved the core issue I want a reminder to do performance improvements specifically for the interface interactions.

BlockBoy32 commented 1 year ago

@achimstruve Hey could you give a swing at this? For some reason mine is still really slow and I can't seem to identify why

achimstruve commented 1 year ago

Now I did some performance tests in our streamlit interface with the following results:

Run empty script: <1s Simulation only: 33s Simulation + Postprocessing: 43s

This clearly shows that the calculation work is somehow done in streamlit, which makes it quite slow compared to the pure Python execution.

This streamlit blog talks about different tips to improve the streamlit performance.

I will close this issue as it is solved and open a new one dedicated to the UI performance.

The new one is #47.