Create a function to read tabular text from a variable into a dataframe

ascension2020 commented 1 year ago

There are several ways to read tabular data from a file directly into a Pandas dataframe. E.g.,:

import pandas as pd
data = pd.read_table("filename.txt", sep=r'\s{2,}', engine='python')

However, it is helpful to be able to read it directly from a variable. This code does the trick:

import pandas as pd
from io import StringIO
data = pd.read_csv(StringIO(text), sep=r'\s{2,}', engine='python')

Tabular data often has a row of dashes (---- ---- ---- ----) between the column headers and the data. In those cases, the "header" and "skiprows" args can be used:

import pandas as pd
from io import StringIO
data = pd.read_csv(StringIO(text), sep=r'\s{2,}', engine='python', header=0, skiprows=[1])

Here is an example:

text = '''Caption            IPAddress       Description        Vendor
-----------------  --------------  -----------------  ------------------
panorama1          10.10.10.1      Panorama Server    Palo Alto Networks
paloalto2          10.20.10.1      Palo Alto          Palo Alto Networks'''

import pandas as pd
from io import StringIO
data = pd.read_csv(StringIO(text), sep=r'\s{2,}', engine='python', header=0, skiprows=[1])

data.to_dict()
{'Caption': {0: 'panorama1', 1: 'paloalto2'},
 'IPAddress': {0: '10.10.10.1', 1: '10.20.10.1'},
 'Description': {0: 'Panorama Server', 1: 'Palo Alto'},
 'Vendor': {0: 'Palo Alto Networks', 1: 'Palo Alto Networks'}}

ngsouse commented 1 year ago

Take a look at datatable. It was built for tabular data and claims to be faster than pandas.

https://github.com/h2oai/datatable

ascension2020 commented 1 year ago

Take a look at datatable. It was built for tabular data and claims to be faster than pandas.

https://github.com/h2oai/datatable

Thanks for the share. I checked it out over the weekend. It is interesting, especially since we work with processing so much text.

Let's sync internally regarding next steps.

InsightSSG / Net-Manage

Create a function to read tabular text from a variable into a dataframe #301