OfficeDev / office-js

A repo and NPM package for Office.js, corresponding to a copy of what gets published to the official "evergreen" Office.js CDN, at https://appsforoffice.microsoft.com/lib/1/hosted/office.js.
https://learn.microsoft.com/javascript/api/overview
Other
667 stars 94 forks source link

#NAME issue for custom functions at random times on loading a workbook #4487

Open demyte opened 3 months ago

demyte commented 3 months ago

Summary

Following on from #4007 and as noted there, we are seeing a similar issue which severly impacts the confidence of users in our product for something we seemingly have no control over...

Our addin is a shared runtime addin that uses streaming functions and at random times, users just get #NAME for all of our functions.

We load and register the functions as early as possible in our addin and have tried waiting for an Office.onReady(...) and not waiting - it doesn't seem to matter.

This can start from either of the two scenarios noted below:

Scenario 1

Scenario 2

image

We have console logging that on a certain action in the taskpane will attempt to re-associate the functions as was done on addin loading, and we have noticed that when it is in the #NAME state, calling associate again gives the [DuplicatedName] warning as if the functions were actually loaded and working:

image

We use this code with the function associate(id: string, functionObject: Function): void; signature to load the functions:

image

Resolution for users is fairly scorched earth:

Your Environment

Expected behavior

Functions should load every time, without fail.

Current behavior

Functions are randomly, at times, even when the users is in a working addin session and loads another workbook, returning #NAME

Steps to reproduce

I wish this was reproducible on demand, but it isn't as it's to do with the Officejs core failing to load the custom functions reliably every time.

gmichaud commented 3 months ago

@demyte we also face this issue in production and it negatively affects the user experience. So far we have not been able to isolate the issue or identify replication steps. On a related note, have you faced an issue where files that have been impacted by the #NAME error have an issue where custom functions then show a #VALUE! or similar error and completely fail to refresh (e.g. the custom function doesn’t get called at all). I documented this problem internally yesterday so that we can file it here, but the starting point was basically when functions failed to register. The only workaround on affected files is to either go to formula bar and press enter on each formula, or do a full recalculation followed by a Calculate Sheet on each affected sheet — something users can’t be expected to figure out on their own let alone do reliably on workbook containing 100+ sheets and hidden sheets.

demyte commented 3 months ago

@gmichaud Yes, we also get that from time to time (not as frequent as this current issue) and have no other way to resolve it beyond what you note.

I agree that it does seem either partially or fully related to this issue of the functions not correctly loading that we are seeing, so hopefully if we can get a resolution to one, we get it for both. 🤞

gmichaud commented 3 months ago

@demyte thanks for your reply, good to know that it’s not just us experiencing this problem. We have saved a file in this state and will create an issue on GitHub in the coming days to document the problem where custom-function cells no longer recalculate.

MiaofeiWang commented 3 months ago

Hi @demyte , thanks for reporting this issue. I am not able to reproduce this issue on my side. Could you please help provide a screen recording if the sample file/data is not convenient to share? I think you are right that it may be related to loading of Office JS files.

demyte commented 3 months ago

Hi @MiaofeiWang

It is not something we have been able to create a reproducible scenario for at all as it is so random.

When we get a support call from a user, we are in triage mode and our aim is to get them back and working again asap.


Some thoughts...

The only consistent thing about it is that when it occurs it is on the load of a workbook.

It does not seem to be workbook related as on a reinstall of the add-in and reload of the same workbook, it's all fine.

The fact that when it is in this state and we (as shown above) go and re-associate the custom functions callback it shows as it already being associated, feels to me like there could be some race condition in the OfficeJS loading of the functions.json into Excel and the calls to CustomFunctions.associate(...) that is occasionally being triggered so the CustomFunctions runtime thinks they are loaded but Excel doesn't?

I know there is an internal registration call that ScriptLab uses (CustomFunctionManager.register(jsonMetadataString, code);) to just-in-time register functions into the runtime, could we use that to force reload our functions into the runtime?


We are really invested in getting this issue resolved for our users, as along with @gmichaud #4502, they are real confidence/trust breaking issues for our product, that, from what we know, are not to do with us, but with the OfficeJS implementation.

Please let me know if there is anything I can do to help get a resolution for this.

gmichaud commented 3 months ago

@MiaofeiWang we don't have any replication steps either and also get support requests for this issue every month. It's usually urgent and we don't have time to do much investigation as we have to get the customer's machine up and running again. I definitely affect the confidence/trust in our solution. Reinstalling the add-in and/or clearing the add-in caches fixes it. Is there a folder, log or registry hive that we should backup for you to be able to investigate this the next time it happens?

MiaofeiWang commented 3 months ago

Thanks @demyte @gmichaud for the information. Agree that we are dealing with a race condition, which inherently makes it challenging to consistently reproduce the scenario. Currently, we lack specific logs that would allow us to trace each step that occurs.

@gmichaud could you please clarify if the #NAME issue of your add-in also occurs on streaming functions?

Regarding the dynamic registration feature used by ScriptLab, it employs a specialized method that is not yet broadly available for other add-ins. We apologize for any inconvenience this may cause. Our team is actively exploring the feasibility and necessary efforts to make this feature accessible to other add-ins in the future.

gmichaud commented 3 months ago

@MiaofeiWang we don’t use streaming functions so I can’t comment on that unfortunately

MiaofeiWang commented 3 months ago

@gmichaud , thanks. May I know if there are entities (instead of primitive types like boolean, number and string) in the file that are output by your functions when the #NAME issue happens during workbook loading? Recently we fixed an issue that loading workbook triggers functions (with entities as output) unexpected re-calculation. We expect this fix will resolve some of the #VALUE! issues.

gmichaud commented 3 months ago

@MiaofeiWang we don’t return entities, but an important detail - our functions are dynamic-aware and can return either a single scalar value or an array.

wh1t3cAt1k commented 3 months ago

@gmichaud @MiaofeiWang correction, actually we do return FNVs (for ISO-formatted dates) which are "kind of" "entities" because they are not simple booleans/numbers/strings. And we have allowCustomDataForDataTypeAny enabled in our manifest.

Our bug is exactly about #VALUE errors https://github.com/OfficeDev/office-js/issues/4502 - @MiaofeiWang if the fix is already delivered what is the ETA (or ideally a build number) of the deployment so we can test? 🙏

wh1t3cAt1k commented 3 months ago

FYI our support team just mentioned that over the last months we received tens of tickets with the functions stuck in the #NAME state...

demyte commented 3 months ago

@MiaofeiWang Is there any extra detailed platform logging we can enable to assist with this?

As a dev, I understand completely how hard these issues are to lock down to a cause, but we really need Microsoft to get this resolved and provide a robust experience here.

Also, we use allowCustomDataForDataTypeAny:true for our functions and some of them return entities and others just numbers/text. When the #NAME issue occurs, it's for all of our functions.

MiaofeiWang commented 3 months ago

@wh1t3cAt1k Sure. The re-calculation issue was fixed for Excel on Windows with build >= 16.0.17723.20000. This will prevent from some unexpected re-calculation which may reduce possibility of #VALUE issue. But I have no enough confidence that this can resolve the issue of #4502 .

@demyte Totally understand your feeling. We are investigating the possible root cause of #NAME issue on streaming functions. But unfortunately no luck as we are not able to reproduce this even for one time. So one screen recording may be able to help us.

demyte commented 2 months ago

@MiaofeiWang

One thing we have noticed across all the support calls we are getting for this, is that it can often happen when the app magic link is used, especially for a user that has never used our addin before.

Ours is: ms-excel:https://xpna.app/assets/xpna-onboarding.xlsx

NOTE:

You do not need to be a subscriber to get the #NAME issue as the functions are loaded on addin start (before subscription validation etc...)

The steps would be:

Hopefully, following this code path through the source allows you to understand more readily where the issue could be occuring?

gmichaud commented 2 months ago

One more case today, from a different user…

gmichaud commented 2 months ago

One more today. This is a user that frequently works with our add-in, we have not pushed any update recently or changed anything on our end that could have triggered this issue. The side-panel and ribbon commands were all working fine -- it's just the custom functions that are not working. We asked the client to clean the web add-ins cache, and issue was resolved.

Reprinting the exact information provided by the client:

We just encountered an issue with my coworker trying to unhide one line and refresh the file I created. Below is a snapshot of what keeps happening to her. We thought that it may have to do with her downloading the recent Microsoft update on her PC, so we exchanged PCs and logged in separately on those. She was able to login with her credentials on my PC, unhide the one row and refresh without any errors. I was able to login with my credentials on her PC and do the same without any errors. The issue is only occurring on her PC with her credentials. Note: This is the same file her and I are both reviewing, which is under the same location. Unknown

gmichaud commented 2 months ago

Update! We had another customer call in today, and this time the user was a bit more technical and was willing to give me an hour to troubleshoot. Same symptom as usual -- add-in is loaded, ribbon is there, but all custom functions show the #NAME! error. Excel Intellisense shows none of our custom functions either if you try to build a new cell (even in a new workbook).

Screenshot of Intellisense showing none of our custom functions, even though the add-in is loaded:

Screenshot 2024-06-17 at 5 44 01 PM

How it should look:

Screenshot 2024-06-17 at 5 22 28 PM

I asked the user to backup the %userprofile%\AppData\Local\Microsoft\Office\16.0\Wef folder, and then delete it manually. After restarting Excel and re-adding the add-in, everything was back to normal. I decided to use a folder comparison tool to compare both WEF folders - there are multiple differences (probably cached data and other add-ins that were installed at some point), but one thing in particular struck my eye - there's no folder for our add-in (store ID WA200002311) in the Wef\CustomFunctions\v1.7\Omex folder:

Screenshot 2024-06-17 at 5 46 58 PM

On a working system I decided to manually delete this folder from Wef\CustomFunctions\v1.7\Omex, and the end result is similar to what is experienced by our users -- custom functions stop working, and the only fix is to clear the add-ins cache and reinstall the add-in. I recorded a demo here: https://www.loom.com/share/b074d24a9ad249d9a496631a5f90212f?from_recorder=1&focus_title=1

The JSON files under CustomFunctions point to a Metadata folder located in [guid]\Omex[some binhex64 number]\Metadata -- on both systems the metadata is there, the only thing that appears missing are the files in Wef\CustomFunctions\v1.7\Omex... I can share the content of the WEF folder with Microsoft on request.

Hypothesis: during loading of the add-in metadata, an exception occurs which causes the process to be silently interrupted, resulting in a partially initialized add-in. Excel updates the add-in metadata at regular intervals, which explains why the problem is happening at random, to different users, even when we don't release any updates of our add-in.

Anectodally, we believe we have seen a higher volume of cases around the time we push a new version on AppSource (we've done that sparingly, last time was 2 months ago) but I am not sure there's any relationship. The user calling today has been definitely using the add-in daily without issues, so they would have had the most recent version.

demyte commented 2 months ago

@gmichaud amazing!!

Thanks for getting all that captured.

MiaofeiWang commented 2 months ago

@gmichaud Thanks for the valuable information! The link files (for custom functions' JS and metadata) are under the path "Wef\CustomFunctions\v1.7\Omex". If those file are not successfully created, Excel cannot find the associated JS function. We will investigate why and how often this issue happens (tracked by internal work item 9090753).

wh1t3cAt1k commented 2 months ago

@MiaofeiWang this is now one of the most common support requests that we receive from the customers, the issue with ugly and doesn't have an easy workaround. According to Velixo support, we have dozens of tickets with the #NAME issue happened in the last several months. I hope this bug fix could be prioritised on your end!

penglongzhaochina commented 2 months ago

Hi @gmichaud

I tried to replicate the issue by following your step. After I deleted the specific wef folder and reopened the file and re-install the add-in, all work well. Which means re-install add-in can resolve that issue. this is the recording:

https://github.com/OfficeDev/office-js/assets/106959323/ce9e9f31-54fc-4e95-99c9-8d26702ff274

The hardest part for us is how to consistently replicate this issue. It would be great if you could find some clue that cause the folder missing or some feature can consistently replicate that issue.

Thanks.

demyte commented 2 months ago

@penglongzhaochina would you be able to defensively code against this issue?

By this I mean something like, on addin load, if the manifest says there are functions but the correct folder and structure is not there, then create the correct folder and structure again?

This would then be self-healing and seamless to the end users.

penglongzhaochina commented 2 months ago

@penglongzhaochina would you be able to defensively code against this issue?

By this I mean something like, on addin load, if the manifest says there are functions but the correct folder and structure is not there, then create the correct folder and structure again?

This would then be self-healing and seamless to the end users.

The addin loading already do that. From my recording you can see that re-install the add-in will create the folder which is missing and resolve the #name issue.

gmichaud commented 2 months ago

I’m also supporting @demyte idea. This issue is happening often enough to a large number of users across his and our add-in that it warrants adding specific measures into the existing code…

If the code was open sourced I would do a code review and look at all the places there the manifest files are getting cleared/updated, to try to identify scenarios where things can be wrong. There are multiple files getting cached, what if there’s an exception right before functions.json gets saved? Does it retry on next load or does it assume the cache is up-to-date?

Also - do you have detailed logging in your telemetry around this part of the code? Thanks for all your help… if we do find additional clues I will update this!

demyte commented 2 months ago

@penglongzhaochina I do not mean on addin reinstall, on every addin load.

penglongzhaochina commented 2 months ago

Hi @demyte @gmichaud I do want to help you resolve this issue. Making our customer successful is our duty. We can do nothing if we can't find a consistent reproduce step. I already debug the code around this functionality, it works well every time just as my recording. Maybe we can replicate this issue by deleting the wef add-in id folder, but this would not happen in the normal using of excel. We need to find what cause the folder missing or other reasons. We also can't write the defensive code before we find the root cause.

Thank you.

demyte commented 2 months ago

@penglongzhaochina in your video you are re-installing, which we all confirm works to resolve the issue, but it is not a workable resolution for users.

I am suggesting having a check on the loading of the addin - every time - that will make sure the folder structure and files etc.. are correct and then call for the functions to get loaded.

I feel like it is something in the core OfficeJS code is causing this to happen, which has come out within the last 3-6 months.

penglongzhaochina commented 2 months ago

Hi @demyte, @gmichaud ,

Thank you for your suggestion and effort here. Your suggestion are very valuable. We will dive deep into this issue based on your suggestion. Will let you know once we have any progress.

wh1t3cAt1k commented 1 month ago

Velixo internal tracking item: https://3.basecamp.com/3496385/buckets/16689259/card_tables/cards/6462474487

@demyte THANK YOU for suggesting this, yes this is a desired resolution for us as well.

By the way, in our experience, the stuck #NAME state has exacerbated recently indeed, but it may be just the fact that our user base is growing. Otherwise, we had reports from our internal QA of files stuck in the #NAME state from as early as the beginning 2023.

penglongzhaochina commented 1 month ago

@penglongzhaochina in your video you are re-installing, which we all confirm works to resolve the issue, but it is not a workable resolution for users.

I am suggesting having a check on the loading of the addin - every time - that will make sure the folder structure and files etc.. are correct and then call for the functions to get loaded.

I feel like it is something in the core OfficeJS code is causing this to happen, which has come out within the last 3-6 months.

Hi @demyte @gmichaud @wh1t3cAt1k

Thank you for your patience. One-click of your add-in from "My Add-ins" will make sure the folder structure and files are correct and then call for functions to get loaded. This operation is as light as add-in loading. Please have a try from your side to check if it can resolve your issue.

demyte commented 1 month ago

@penglongzhaochina thanks for the update.

Is this going to work even if the addin is already loaded, i.e. here?

image

penglongzhaochina commented 1 month ago

@penglongzhaochina thanks for the update.

Is this going to work even if the addin is already loaded, i.e. here?

image

Yes, One-click from here will resolve your issue.

demyte commented 1 month ago

@penglongzhaochina ok, thanks - I'll add it to our support documentation for users to be referred to.

My hope is really for this issue to go away all together, is there a reason that this cannot be done on add-in load every time, if it such a lite weight operation?

I am pushing for this because when a user encounters this, they are forced out of their normal usage, it is confusing for them and in general reflects badly on our product.

Also, the user may not reach out to us immediately for support and may waste time trying to find a resolution on their own and just get more frustrated.

Please consider making a complete fix for this.

wh1t3cAt1k commented 1 month ago

@penglongzhaochina first of all, thank you for providing an easier workaround. I will let our support team know about it!

That being said, I completely agree with @demyte and we should recognise this for what it is, just a workaround...

From the perspective of our clients, the UX of completely randomly losing custom functions in the file, seeing #NAME all over makes non-technical users panic, and produces an impression of an unstable and broken product. It definitely affects our ability to sell and renew our service.

To reiterate, our add-in has thousands of unique monthly users, and we already have gotten (literally) dozens of support requests around the #NAME issue alone... We will do our best to teach the existing customers this workaround, but with such numbers and especially as people transition from our legacy COM offering to JS we will continue getting support escalations about this.

I strongly concur that Excel should be able to auto-recover from this situation even if it is yet difficult to understand the root cause of why it happens in the first place.

As @demyte mentioned, a relatively simple check-and-recover-if-needed on start-up would allow the user to have a perfect UX without disruptions.

demyte commented 1 month ago

@penglongzhaochina

Just to let you know that we have users try to use the "click on the icon again" and this has not worked, they have still had to:

... for our addin to start working again.

Where are we at with a permenant fix for this?

wh1t3cAt1k commented 3 weeks ago

@penglongzhaochina @MiaofeiWang @shanshanzheng-dev

I had a chat with our support department today and I confirm that our support still identifies this as a critical UX dealbreaker for which we are receiving dozens of support requests.

I am asking you guys to treat this as a P1 partner bug, please escalate this if you can.

rde-sdc commented 3 weeks ago

we are experiencing the same problem. However, to get the custom functions installed successfully when their cache is cleared, we have users open a blank sheet (workbook). click on Add-in button from the Home tab then re-install the specific add-in (even though it is already installed). this workaround works 100%, for us. if a user does this when they are in a file that is affiliated with the add-in, it does not work.

The root cause of this is is that the CustonFunctions folder is not being populated (~/16.0/wef/CustomFunctions/).

we found that if a user clears their cache, opens a file that is associated with an add-in with custom functions, it does not create the subfolders in the ~/16.0/wef/CustonFucntions/ folder. if they install the add-in from a blank\new workbook, it creates all the subfolders properly. possible a race condition with the sheet trying to load the custom functions and it not existing yet (possibly)

hope this is resolved soon.

penglongzhaochina commented 3 weeks ago

Hi, we are still investigating this issue. will let you know once we have any progress.

4tti commented 1 week ago

Any update please? This is happening for years with various scenarios (and reasons) and still not fixed in all cases. It is extremely annoying and even can corrupt the reports in the end (although there are no clear steps) which then requires customer to go through the files and use inspect document. And many customers have hundreds of such files!

gmichaud commented 1 week ago

I don't know if this can help uncover the root cause or condition under which the #NAME! error randomly start occurring for users, but I recently discovered issues when interacting with Excel via COM automation #4833. Most notably, if you launch Excel from an external app and create a workbook BEFORE Excel is made visible, the #NAME! error constantly appears. Restarting Excel cleanly fixes it however, so I'm not sure if it's the same issue or not.