FeedTheCube / CogDoc

0 stars 0 forks source link

Multiple Database Connections/SQL #27

Open hyndgrinder opened 6 years ago

hyndgrinder commented 6 years ago

Cognos content stores can use basically any mainstream dataproviders. We should make connectors and deal with any SQL customizations required for specific platforms. Any others can be added as needed.

Initial targets should include MSSQL Oracle DB2 Sybase? MySQL?

SinclairC commented 6 years ago

Are we going to know beforehand what platform we're about to connect to, or do we have to send a request somewhere to get that info?

hyndgrinder commented 6 years ago

Here is a link to the content store KB article @ IBM.

I used a solution called Generate Data, and it detects if the app is configured properly, and if not sends the user to a different start page, which collects the connection data, and other preferences, then populates the web.config file. That may be more complicated than needed, but we do need a config file. I will create a new issue for setting up the web.config.

SinclairC commented 6 years ago

Steps?

  1. User points Python application to content store
  2. Python determines type of content store somehow
  3. Python somehow determines correct location of report in content store (by user-supplied report name?)
  4. Python downloads XML report from content store
  5. Python parses XML
  6. Python generates an HTML version of the important parts of the report
  7. The HTML is exported to a filename of the user's choosing

Questions:

-Is this correct? -Are the reports stored in a different form or pieced together from different locations in the content store or anything weird like that? -Any thoughts on step 3? -Do we have to deal with decryption or anything? -Are we planning to have the user enter their database login info each time or store their login info?

hyndgrinder commented 6 years ago
  1. User points Python application to content store - CORRECT
  2. Python determines type of content store somehow - USER PROVIDED IN POINT 1
  3. Python somehow determines correct location of report in content store (by user-supplied report name?) - PYTHON EXECUTES A SQL SCRIPT TO ITERATE THROUGH THE CONTENT STORE
  4. Python downloads XML report from content store - THE CONTENT STORE SAVES THE XML, PYTHON WILL QUERY AS NECESSARY.
  5. Python parses XML - CORRECT
  6. Python generates an HTML version of the important parts of the report - WHEN USER SELECTS A REPORT AND EXECUTES AN EXPORT COMMAND (CHECKLIST AND A BUTTON)?
  7. The HTML is exported to a filename of the user's choosing - YES, BUT LETS SUGGEST THE REPORT'S reportName PROPERTY IN THE DIALOGUE BOX.

Questions:

-Is this correct? DISCUSSED ABOVE -Are the reports stored in a different form or pieced together from different locations in the content store or anything weird like that? THEY ARE COMPLETE IN FIELD CALLED SPEC IN THE CMOBJECTS TABLE -Any thoughts on step 3? THE ONLY CONTENT STORE OBJECTS WE'RE LOOKING AT TO START ARE REPORTS -Do we have to deal with decryption or anything? - UNLIKELY, COGNOS DOESN'T STORE ANY DATA, SO DOUBTFUL USERS ARE ENCRYPTING. I RESERVE THE RIGHT TO BE INCORRECT. -Are we planning to have the user enter their database login info each time or store their login info? WE SHOULD LEAVE THAT UP TO THE USER AS PART OF WEB.CONFIG, BUT IF THEY CHOOSE TO STORE IT, IT HAVE TO ENCRYPT.

SinclairC commented 6 years ago
  1. Ok, so the user will always know what type of content store they're connecting to? Wasn't sure if that info was readily available since it seemed like it was all handled behind-the-scenes normally. That simplifies things a lot. Sounds good to me.

  2. Yes, but what is it looking for? A unique report name? A report ID? A user ID so multiple reports are shown? A timeframe? All of the above?

  3. Sounds good.

  4. Maybe it should just save them with the report names by default. It'd be faster, simpler and allow for multiple reports at once without having to prompt the user. They can always rename things if they want afterward. I don't know. What's your preference? More user control during the process, or less complexity?

Have you ever set up a web.config file? I haven't, but what I'm reading makes it seem like they're not particularly suited to our current project. Seems like they're mostly an ASP.NET thing and generally have a lot of info in them that we don't really need. I could be wrong, but it might make sense to make an XML config file and not necessarily a typical "web.config" one.

hyndgrinder commented 6 years ago
  1. Ok, so the user will always know what type of content store they're connecting to? Wasn't sure if that info was readily available since it seemed like it was all handled behind-the-scenes normally. That simplifies things a lot. Sounds good to me. THE ADMIN ALWAYS WILL. PRESUMABLY, THE ADMIN WILL BE OUR FIRST USER. GENERAL USERS PROBABLY WONT NEED THESE DETAILS.

  2. Yes, but what is it looking for? A unique report name? A report ID? A user ID so multiple reports are shown? A timeframe? All of the above? TBD, BUT WE COULD/SHOULD USE THE CMID FROM THE CMOBJECTS TABLE TO UNIQUELY ID INDIVIDUAL REPORTS.

  3. Sounds good.

  4. Maybe it should just save them with the report names by default. It'd be faster, simpler and allow for multiple reports at once without having to prompt the user. They can always rename things if they want afterward. I don't know. What's your preference? More user control during the process, or less complexity? THE CMID SHOULD MAKE ITS WAY INTO THE REPORT'S FILENAME IF BATCH EXPORTING TO ENSURE NOTHING IS OVERWRITTEN. INDIVIDUAL EXPORTS CAN BE SUGGESTED, BUT MANAGED BY THE USER.

Have you ever set up a web.config file? I haven't, but what I'm reading makes it seem like they're not particularly suited to our current project. Seems like they're mostly an ASP.NET thing and generally have a lot of info in them that we don't really need. I could be wrong, but it might make sense to make an XML config file and not necessarily a typical "web.config" one. I'M REFFERING MORE TO THE CONCEPT THAN A SPECIFIC TECHNOLOGY. PYTHON/FLASK HAS SOME SORT OF EQUIVALENT, XML CONFIG WOULD BE FINE WITH ME, ESPECIALLY IF/WHEN WE HAVE A CONFIGURATION UI. AT THAT POINT, IT WON'T MATTER...

SinclairC commented 6 years ago

That all sounds good. I'll get to work on this stuff. Two more questions, though:

  1. Are you thinking it'll be a web-based interface (for the user to specify a CMID, content-store and platform), or do you want an actual application?

  2. How do you think we should test this? I don't have access to any content stores to try connecting to. I can set up a basic mySQL database on my local web server, and should probably be fine with testing MSSQL as well. Not sure about the others, though. Maybe we'll just focus on those for now? I mean, I can write code that will likely work for connecting to other platforms, I just won't be able to test it yet.

SinclairC commented 6 years ago

Your latest branch looked good, but when I tried it it didn't work (Guess I should've tested it before accepting the pull request...Wasn't thinking, apparently). It says CogDoc doesn't exist when importing dataConnections (The project also didn't contain dataConnections.py so I had to add it manually).

Any idea why Visual Studio can't find CogDoc.dataConnections?

Also, do you mind if we change D:\UploadData to relative path to a folder within the project? I'd rather not put that in the root of my D drive.

Edit: In my case, I had to just change it to "import dataConnections as DC" without mentioning CogDoc. The file is in the same place it is for you, so I'm not sure what's different.

I'm also thinking we shouldn't have any method definitions in main.py and should move those to Util or possibly a new PY file for DB-related methods.

hyndgrinder commented 6 years ago

Yes, please use relative paths. I will do so in future as well. I got caught rushing through my trials, and didn't properly clean up the code, but I wanted to give you some insight as to where I was going. Clearly, that plan didn't work.

I agree that main should be limited to function calls, not functions themselves. Was just trying to get you a connection the content store so you could see the default iteration of reports, and didn't have time to separate it. Also, I think pyodbc is supposed to be able to connect to multiple platforms (MSSQL, MySQL, Oracle, ...not sure about DB2). If not, lets find a library that does (SQLAlchemy?). I don't really want to reinvent the wheel on non-core functionality. We'll also do our best to structure our queries in standard SQL, to minimize platform-specific coding.

The struggle that you and I are having sharing code and databases, is only an insight into what are likely to be complicated landscapes in customer deployments of Cognos. For example, for every deployment, there are likely to be at minimum 3 content stores: one each for DEV, TEST, & PROD. Each of those could be a different DB platform or server or database. These collections of content stores could, presumably, benefit from comparisons, and possibly other aggregated analysis.

SinclairC commented 6 years ago

Happy New Year.

The SQL changes you made look good. Much simpler to work with across multiple DB styles.

Also, I looked into pyodbc and it seems like it'll support mySQL and Oracle. The "tricky" part is that it requires a special driver for each type of DB (no idea if we can assume the driver stays the same on different computers), while MySQLdb works out-of-the-box. I may be wrong, but what I read about setting up pyodbc concerned me a bit. I'm wondering if it might be best to use individual libraries that are specifically for the DB types we're targeting. In order to avoid letting that complicate the code, we should still have the same method names available (see the current branch, where both MSSQL and MySQL use "dbConnect()" with different configurations).

On another note, I'm curious what sort of "flow" you are envisioning for this program.

Right now, I'm a bit confused as to why we grab all reports from the database and parse them before the user has even chosen what they want. Is it just to save time while looking through the reports, since everything's already loaded/parsed, or is it just because it's a work-in-progress? It seems like there could be a lot of reports grabbed, so it might be best to only parse the chosen one(s).