KiaraGrouwstra commented 8 years ago

Trying to figure out if I can keep Load.m 'neutral' in terms of file paths though... should we perhaps git-ignore LoadPath.m so we can use it to each store our personal settings without influencing others?

KiaraGrouwstra commented 8 years ago

Hi @IvanBond, I'd just received a pull request from @ImkeF, and she'd attended me to your further additions.

I'd be glad to merge them in, but I'm seeing an issue in regard to your move of the hardcoded file path from LoadPath.m to Load.m. I guess the problem is clear -- as soon as we merge in code containing one another's file path, loading would break until we'd manually change the path back to our own. That's somewhat undesirable, since it poses a challenge to the intention of using Git for the purpose of sharing code.

With that in mind, I might be able to merge in all but those two commits using git cherrypick, but from what I've read, this can lead to some undesirable results, so I'm hesitant to go that route.

I think I do see a solution. If we could store our personal file path in a file ignored by Git (i.e. listed in .gitignore) like that LoadPath, we could keep that information intact for usage without affecting the other files we're sharing on Git. To guide new users, we could add a renamed version as an example file that would be included in the repository.

Might something like that be satisfiable to you as well? Perhaps this leads back to what motivated you to make your change originally, which may be of interest to others as well. Did you experience a difference in performance between the 'external file' vs. 'hardcoded in the Load script' approaches?

IvanBond commented 8 years ago

Hi Tycho! The main reason why I hardcoded path into Load.m is that I want to avoid presence of additional object in my workbooks - table or separate query. Table with path on worksheet definitely slightly impacts on performance. PQ reads from Excel objects not instantly. And who know how many times during refresh of several queries. I have to refresh ~100 workbooks daily and some of them hourly, each contains 5-10-15 queries, so I'm trying to optimize each step.

As for library of functions it is fine to leave all as you developed initially. Anyway, Load.m always is inserted manually ( don't think someone already creates workbooks with queries using VBA and loading functions from Github).

Thanks, Ivan

KiaraGrouwstra commented 8 years ago

Hm, that definitely does put more performance strain on things than I had so far. If you're saying performance is vital here, might you get further speedup if you'd import the queries into a workbook using those VBA functions I'd done?

Pro: having the functions loaded locally would minimize required access of files (or Excel objects for that matter).
Con: since code would no longer be fully centralized, if you'd update an imported function you'd need to reimport it into that workbook for it to use the updated code.

If this sounds terrible, it could be improved -- currently I think my VBA hadn't added a way to import queries into a bunch of workbooks in batch yet, so that's one thing that could be added. Hell, even if just used for batch importing the Load function into a bunch of workbooks that might be useful.

Anyway, if reimporting on every change sounds ridiculous, just know it's not a full either/or situation. If having the queries locally does improve performance, you could also consider importing some battle-tested functions that get called a bunch, while leaving external other ones you're still changing more frequently.

Out of curiosity, it sounds like you have a bunch of similar files. I suppose for your use-case it wouldn't do to have one central file import data from all of them for processing then slice again as needed, huh? Having to juggle that amount of files definitely sounds like a pain. :(

IvanBond commented 8 years ago

Thanks, Tycho. Workbook that you had developed is great, I saw it long time ago but couldn't try these techniques at real work, because we still on Excel 2013. This is the main problem. Even if I install Excel 2016, the rest of organization won't be able to use my solutions.

How typical self-service Excel-based solution looks like (with Power Query and Power Pivot data model)? In a majority It is manually created workbooks. At the beginning of work on such workbook, often not clear how to get desired result. Analyst makes queries, analyzes what columns needed, what not, build relationships between tables, then DAX, and then visualization. Almost each step requires "trial and error" iterations. Nevertheless, in the end analyst get a workbook that can be refreshed in one click on "Refresh All". Very often workbook must be shared with someone else, in network, SharePoint, or via email etc. Another person should be able to update workbook without questions about "Privacy levels", should have access to "library of functions" (?) and so on. Solution must be straightforward, simple, self-consistent. Hence, everything must be inside of workbook. Often macro-free workbook. Next point - possibility to update workbook as dataset on Power BI. Currently, it doesn't support folding of some functions, e.g. if you make Binary.Buffer( Web.Contents ... ) ) - Power BI won't allow to set up Scheduled Refresh. Same for loaded functions.

Regarding "farm of reports" that I have. Initially I have some number of basic Excel models. Each model should be populated with some data, relevant to certain group of colleagues. Let's define this as "scope of data". Each workbook has own set of scopes - can be from 3 to 100 and above. One Scope - one "Refresh All" -> "Save As" and so further. Workbook data models doesn't have row-level security, therefore I must do huge amount of refresh operations to create separate workbooks. Of course, process is automated. And it worth to reduce cases when "something went wrong".

My conclusion. Initially loaded functions can speed up development of data model workbook. But when it is ready for scheduling - would be better to "break links", embed functions in place of its usage.

KiaraGrouwstra commented 8 years ago

Hmm, I see, that makes sense. I'm definitely learning here; I wasn't aware buffer functions could block refresh.

This is the main problem. Even if I install Excel 2016, the rest of organization won't be able to use my solutions.

I'm thinking if Excel 2016 is used only for its VBA ability to inject the PQ functions into other workbooks, the resulting file may work on 2013 as well. This might serve as a 'publishing' step to switch from external functions (used for dev) to directly included functions (for distribution).

But that sounds a lot like the 'breaking the link' you mentioned as well. So now I'm a bit confused. If you are already managing that, how are you doing it now?

IvanBond commented 8 years ago

Sorry for confusion :(. I checked your Excel file once again. I was totally wrong regarding "breaking links". Don't know why I thought that VBA imports external functions to #shared... It creates query objects instead. So in fact import needed only once... and separate table can contain list of functions for loading... that's interesting. Will see how it can help when we get Excel 2016.

Currently, we manage this manually. Go to functions library, take text of function, past as part of code or as separate query. In some solutions we use Load.m function and load functions on fly, which is not an efficient solution from performance point of view. However, in the same time not critical.

Buffer function doesn't block refresh itself. It can be used in following way

Source = Web.Contents( ... ),
buf = Binary.Buffer( Source ), ...

In this case Power BI refreshes dataset. From my experience, if I develop workbook for Power BI and plan auto-refresh, I should program M-code and build queries in a totally different way. In local workbook I can fold functions as many times as I want, turn on Ignore Privacy, easily combine data from worksheet and Web etc. Such workbook can be loaded and used for reports in Power BI, but cannot be refreshed from Power BI. "Power BI workbooks" require everything separate, one source - one query. I'll write about this very soon in my blog.

KiaraGrouwstra commented 8 years ago

Haha, yeah, importing to #shared would have been cool as well, but I think this one is all we've got. Only one query, not other functions invoked? That sounds positively terrible to me, as if they're consciously ruining composability of logic :(, hope that'll change in future iterations. Definitely doesn't sound like my approach would be helping much there.

Anyway, I guess for now a merge may not be viable yet then, so for now I'll leave it to @ImkeF on how she'd prefer to handle that -- if she's using your functions as well she might well prefer to merge/fork from your repo instead.

ImkeF commented 8 years ago

Hi @tycho01 and @IvanBond , thank you so much for your discussion here! I understood almost nothing. That made it clear to me that I should also provide an alternative solution to share M-code / work with shared M-code, as I try to address people who are even lower on the learning curve than I am. (Will use Power Query - combine sources from different website and combine that with your own list...) But I would like to promote Tychos initiative to share M-code within the community and post the link for people who know how to deal with it. I shall also recommend to read his Readme. Maybe you could include a brief explanations on which principles you follow here (if that's not obvious for the expert from the circumstances...) Is that OK with you? Thanks again :-)

KiaraGrouwstra commented 8 years ago

Hi @ImkeF: Thanks for asking; I tried to add clearer usage instructions to the readme now. As you'll see, I'm now asking people to copy the LoadPath.example.m file to LoadPath.m and put their own query path in. This change addresses the issue I mentioned.

Comments:

Ivan put his path directly into Load.m instead, which saves an import and is fine for personal use, but just makes sharing a little harder (preferably you'd need to make sure not to commit that edit, or others can't merge your updates without having to change it back).
Ivan also used his Github repo as a remote query folder, which allows using your workbook from other computers as-is, but I'm not sure if this is smart enough to prevent making many duplicate network requests. (In fact, scraping + processing was the main thing I wanted to do with Power Query, and I pretty much switched to using other tools because I felt it's terrible for this purpose.) My alternative, also added to the readme, requires Excel 2016 (for you, not necessarily for the users) though...

ImkeF commented 8 years ago

Thanks @tycho01 ! I’m a rookie on VBA as well and don’t run Excel 2016, so at the moment I would simply connect to the libraries on GitHub using this code:

let
    Source = Web.Page(Web.Contents("https://github.com/tycho01/pquery")),
    Data0 = Source{0}[Data],
    #"Added Custom" = Table.AddColumn(Data0, "Custom", each Web.Page(Web.Contents("https://github.com/tycho01/pquery/blob/master/"&[#"Failed to load latest commit information."])){0}[Data]),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Column1", "Column2"}, {"Column1", "Column2"})
in
    #"Expanded Custom"

This returns all the code in a table and I can just filter the function name I need and copy the lines into my query.

Your VBA-work looks very cool! And promising with respect to a missing feature of my “PQ-Management-Studio”: How to bulk-transfer the commented or translated (connect to a different data-source with different table/column-names) queries back into an existing worksheet. So if you ever feel the desire to apply your VBA-magic on this issue – be assured that I would be very pleased about it :-)

KiaraGrouwstra commented 8 years ago

Well, that's definitely a creative approach. :)

In my workbook I did have PQ check its own queries using #shared, as you'll find I'd done in the last two sheets of my workbook. It did not actually allow inspecting the code though. I suppose the approach I would take would be:

export the queries from the desired workbook by VBA
then have a query read in the query content from that folder, which my Load function does as well. (That also parses them into functions though, while you'd wanna stay with the text.)

But yeah, I don't know if that's adding value over your current hack.

For your second issue, of loading back the results into the workbook, note that PQ queries cannot 'do' anything, like triggering VBA, writing to files, or writing to workbook queries. So in this approach, you're inherently stuck to the result of having altered code output to a sheet. If you're manually triggering VBA, it's possible to write the result back, but you'd still be limited to Excel 2016...

IvanBond commented 8 years ago

Hello, I added couple of rows into @ImkeF sample code.

`let Source = Web.Page(Web.Contents("https://github.com/tycho01/pquery")), Data0 = Source{0}[Data],

"&[#"Failed to load latest commit information."] ) ),

#"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom.1", each Text.Combine( Lines.FromBinary( [Custom] ), "#(lf)" ) ),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "Custom.2", each Expression.Evaluate( [Custom.1], #shared ) )

in

"Added Custom2"`

must be easier to copy query text. However, I would went to pquery library page and copied code directly from there. Less objects in workbook approach.

Seems, I lost thread of our talk :) We started from Load.m and LoadPath.m, continued about assembling of Excel solutions (VBA vs PQ from local repo / github repo ).

Regarding Load.m one more time. Why I did what I did. You can download Sample of file with demo of real Load.m usage - from my experience. Plus, some other functions. You may see how they called, when, where, how declared. Similar files I have for several reports. Any ideas of improvement are welcomed!

Such self-service reports run on schedule inside of corporate network where me and my colleagues have repository of PQ functions, which is accessible only from corporate network, or through VPN if outside of office. But also such reports travel with me and my laptop to home or business trips to other countries where I can be in situation that I have no access to corporate network. That's why my version of Load.m tries to load firstly from Default Path, and if fails - then from GitHub. Mostly I work with cloud systems, so always need access to the Internet, as a result to GitHub. As such solutions are always self-service solutions, then I suppose analyst building workbook with queries does the same effort when change one row in Load.m or in LoadPath.m. I personally would choose less objects way, as less separate queries as possible.

In addition, almost all functions have no changes for months. It forces me think: If we sure in effectiveness of function, why not to enter full its code into query and avoid Load process from external repo? Local drive / network / github - all are less efficient than code inside of query. BR, Ivan

KiaraGrouwstra commented 8 years ago

Yeah, agreed, the Excel object approach I largely used to facilitate use inside of that sample workbook. My preferred approaches of importing from local functions or batch importing the functions by VBA should both be less bad than the table one too for performance, but yeah, I can definitely see the use-case for pulling from Github as well. And it is definitely unfortunate this VBA support has been lagging... even in 2016 it's still lacking support for query groups.

ImkeF commented 8 years ago

Lost you at load ... but wanted to say thank you again. You helped me. Welcome Ivans improvements - they are already included, but my blogpost might take a while.

KiaraGrouwstra commented 8 years ago

@IvanBond: would it perhaps do for you to make a git branch with the Load.m / LoadPath.m restored? That way perhaps I could merge the rest in from there, without our differentces in those two having to get in the way.

IvanBond commented 8 years ago

Hi @tycho01 , yes... seems we have to go for this option to keep project aligned :). I will make Load.m as yours original and restore LoadPath.m. However, I would like to keep current functionality of my Load.m as new LoadFunctionFromGithub.m as I count this method as quite useful scenario.

KiaraGrouwstra commented 8 years ago

Yeah, wish it were a bit simpler. But sure, an extra load file works for me. :)

IvanBond commented 8 years ago

Done. Is it possible to merge my changes now?

KiaraGrouwstra commented 8 years ago

Hm, let me see, it's showing me "merge conflict on Load.m"...

KiaraGrouwstra commented 8 years ago

It's in. Thanks for all the great work, sorry this PR took so long. :)

KiaraGrouwstra / pquery

Merge back Ivan's additions #9

"Added Custom" = Table.AddColumn( Data0, "Custom", each Web.Contents("https://raw.githubusercontent.com/tycho01/pquery/master/"&[#"Failed to load latest commit information."] ) ),

"Added Custom2"`