FuchsiaSoft / FLUFFS

An enterprise scale file interrogation and information discovery solution for Windows networks
GNU Affero General Public License v3.0
4 stars 3 forks source link

Support for excel files beyond shared strings #1

Closed FuchsiaSoft closed 8 years ago

FuchsiaSoft commented 9 years ago

The current implementation of ODF ExcelReader only pulls the content from the shared strings xml.

FuchsiaSoft commented 9 years ago

Had a bit of a deeper look at this now, each worksheet.xml in xl\worksheets has references to internal values and shared strings. Just need to do a cross reference. In this instance it may be easier to use Open XML SDK rather than reading manually as for Word files. Further research needed.

FuchsiaSoft commented 9 years ago

Following further discussion around this, would we want to have a separate instance of each string?... Current implementation doesn't search for frequency of strings or RegEx matches, but conceivable might be useful in the future. So should possibly parse "correctly" by reading each cross referenced shared string separately

FuchsiaSoft commented 9 years ago

I think using ExcelDataReader library is the way to go for this, although need to do some extensive testing for supportability of the wide array of file formats we'll find. It's certainly the solution for binary files.