QuentinJanuel / AccessDB-parser

A pure javascript Microsoft AccessDB files (.mdb, .accdb) parser
MIT License
8 stars 5 forks source link

Program incorrectly parses IPEDS database #6

Open adruzenko03 opened 3 years ago

adruzenko03 commented 3 years ago

Hello, I have decided to use this extension in order to parse through the IPEDS database (Download Link: Here). My code currently looks like

const db = new AccessParser(fs.readFileSync("IPEDS201819.accdb"));

//Gets all necassary tables
const GENERALTABLE = db.parseTable("HD2018")

Upon running this I get a huge amount of print statements in the following format

Memo data inline
Parsing memo field ♂���077227858
Memo data inline
Parsing memo field ↔���Concordia University Irvine
Memo data inline
Parsing memo field ♂���076084946
Memo data inline
Parsing memo field ☻�
Memo data inline

It was irritating without seeming to cause any issues so I copied the code and started deleting log statements. As I did I found more log statements in the following order

Overflow record flag is not present 2990
LVAL type 1
Overflow record flag is not present 2848
LVAL type 1
Overflow record flag is not present 2776
LVAL type 1
Overflow record flag is not present 2572
LVAL type 1
Failed to parse memo field. Using data as bytes
Failed to parse memo field. Using data as bytes
Failed to parse memo field. Using data as bytes
Failed to parse memo field. Using data as bytes
Failed to parse memo field. Using data as bytes
Failed to parse memo field. Using data as bytes

Once again I am not sure if these statements are due to errors that are thrown or simply log statements that weren't deleted, as the data returned seems to function perfectly fine. This is my first issue I am reporting on GitHub, so apologies if this is unprofessional, and I didn't give enough data. Please do ask me if more information is required. Alex

adruzenko03 commented 3 years ago

After looking into it more, it seems the data has been read incorrectly. Here is the result of db.parseTable("HD2018").lines[0]

[
  '100654',
  '1',
...Everything fine between here...
  '119',
  '1',
  'Alabama A & M University',
  'AAMU',
  '4900 Meridian Street',
  'Normal',
  'AL',
  '35762Dr. Andrew Hugine, Jr.President2563725000636001109 \x0B耀\x00\x00\x00\x00㤱㈷㘱㔴〵 ㄀  ㈀    眀眀眀⸀愀愀洀甀⸀攀搀甀⼀眀眀眀⸀愀愀洀甀⸀攀搀甀⼀䄀 
搀洀椀猀猀椀漀渀猀⼀倀愀最攀猀⼀搀攀昀愀甀氀琀⸀愀猀瀀砀眀',
  '',
  'President',
  '2563725000',
  '636001109 ',
  '197216455',
  '00100200  ',
  'www.aamu.edu/',
  'www.aamu.edu/Admissions/Pages/default.aspxwww.aamu.edu/admissions/fincialaid/pages/default.aspxhttps://www.aamu.edu/Admissions/UndergraduateAdmissions/Pages/Apply%20Today',
  '',
  'https://www.aamu.edu/Admissions/UndergraduateAdmissions/Pages/Apply%20Today.aspxhttps://galileo.aamu.edu/NetPriceCalculator/npcalc.htm  www.aamu.edu/administrativeoffices/VADS/Pages/Disability-Services.aspxA-',
  '',
  ' ',
  ' ',
  'www.aamu.edu/administrativeoffices/VADS/Pages/Disability-Services.aspx',
  'A-2        -2                                                                              -2Madison County\x80\x00\x00\x00\x00\x00㘀⣮\x05\x00\x00\x00\x00\x00\x00삈ȒӭӜӋүҫ',
  '',
  '-2                                                                              ',
  '-2',
  'Madison County',
  '�\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x006�(\x05',
  ''
]

No idea why it seems to break at zip code of all fields. It would be great if you looked into it.

QuentinJanuel commented 3 years ago

Hi,

I'm sorry, this project is just a TypeScript rewrite of a Python version, so I have no clue how it works under the hood. Chances are this bug is also in the Python version, so I'd recommend posting the issue on their repo: https://github.com/claroty/access_parser

Best of luck