Open casaout opened 8 months ago
As briefly discussed yesterday, we think that it's better to not work with encrypted sqlite-databases to avoid making it too complicated for other researchers for their data analysis. Hence, we decided to create an encrypted zip-file.
I've started testing how reading an encrypted zip-file with a non-encrypted sqlite database works in Python, but am experiencing character encoding issues as summarized below. For time reasons, I'd like to ask you @SRichner to try and see if you can make it work.
What I tried:
Thoughts:
import sqlite3
import pyzipper
import io
subject = "32XSB1"
zipPath = "C:\\TEMP\\PA\\PA_32XSB1.encrypted.zip"
pwd = "PersonalAnalytics_32XSB1"
dbfile = "database.sqlite"
tableName = "user_input"
charset = "iso-8859-1" # error: 'near "SQLite": syntax error'
# charset = "utf-8" # error: 'utf-8' codec can't decode byte 0xe7 in position 27: invalid continuation byte
with pyzipper.AESZipFile(zipPath) as zf:
# try 1: extracting then opening the file (in explorer) works
# zf.extractall(path='C:\\TEMP\\PA\\PA_32XSB1_decrypted', pwd = bytes(pwd, 'utf-8'))
# print(zf.infolist())
# try 2: read log/txt file from encrypted zip file works
# try 3: read sqlite file from encrypted zip file results in charset issues
for file_info in zf.infolist():
print(file_info)
if file_info.filename == dbfile:
with zf.open(file_info, pwd=bytes(pwd, charset)) as sqlite_file:
sqlite_data = sqlite_file.read()
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
sqlite_data_str = sqlite_data.decode(charset)
# print(sqlite_data_str) # printing the string actually shows decrypted content
# bug: 'near "SQLite": syntax error'
cursor.executescript(sqlite_data_str) # conn.executescript(sqlite_data_str) # same issue
# Query to list all tables in the database
cursor.execute("SELECT name FROM external.sqlite_master WHERE type='table';")
# Fetch and print the table names
tables = cursor.fetchall()
for table in tables:
print(f"Table: {table[0]}")
cursor.execute(f"PRAGMA table_info(external.{table[0]});")
columns = cursor.fetchall()
print("Columns:")
for column in columns:
print(column)
conn.close()
Note that work in progress is described below and in this branch.