dotnet / interactive

.NET Interactive combines the power of .NET with many other languages to create notebooks, REPLs, and embedded coding experiences. Share code, explore data, write, and learn across your apps in ways you couldn't before.
MIT License
2.8k stars 374 forks source link

What is the limitations/requirements to execute csharp code in notebook? #3534

Open hjy1210 opened 2 months ago

hjy1210 commented 2 months ago

The package and version I'm asking about:

Polyglot Notebooks v1.0.5208010

Question

What is the limitations/requirements to execute csharp code in notebook?

I can execute a simple .net 8.0 csharp console app correctly in VS 2022. But when copy the code to notebook, error occured when executing. What is missing when ported to notebook?

The code in notebook is as bellow:

#r "nuget:itext7"
#r "nuget:itext7.font-asian"

using iText.Kernel.Pdf.Canvas.Parser.Listener;
using iText.Kernel.Pdf.Canvas.Parser;
using iText.Kernel.Pdf;
using System.Text;

string ExtractText(string filePath)
{
        var pdfReader = new PdfReader(filePath);
        var pdfDoc = new PdfDocument(pdfReader);
        StringBuilder sb = new StringBuilder();
        for (int i = 1; i <= pdfDoc.GetNumberOfPages(); i++)
        {
            var page = pdfDoc.GetPage(i);
            LocationTextExtractionStrategy strategy = new LocationTextExtractionStrategy();
            sb.AppendLine(PdfTextExtractor.GetTextFromPage(page, strategy));
        }
        pdfDoc.Close();
        var data = sb.ToString();
        return data;
 }
Console.WriteLine(ExtractText(@"c:\lucenedata\documentsroot\2007-1.pdf"));
Console.WriteLine("Press any key to close app");
Console.ReadKey();

the error message appeared as:

Error: iText.IO.Exceptions.IOException: The CMap iText.IO.Font.Cmap.UniCNS-UTF16-H was not found.
at iText.IO.Font.Cmap.CMapLocationResource.GetLocation(String location)
at iText.IO.Font.Cmap.CMapParser.ParseCid(String cmapName, AbstractCMap cmap, ICMapLocation location, Int32 level)
at iText.IO.Font.Cmap.CMapParser.ParseCid(String cmapName, AbstractCMap cmap, ICMapLocation location)
at iText.IO.Font.CjkResourceLoader.ParseCmap[T](String name, T cmap)
at iText.IO.Font.CjkResourceLoader.GetUni2CidCmap(String uniMap)
at iText.Kernel.Font.FontUtil.GetToUnicodeFromUniMap(String uniMap)
at iText.Kernel.Font.PdfType0Font..ctor(PdfDictionary fontDictionary)
at iText.Kernel.Font.PdfFontFactory.CreateFont(PdfDictionary fontDictionary)
at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.GetFont(PdfDictionary fontDict)
at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.SetTextFontOperator.Invoke(PdfCanvasProcessor processor, PdfLiteral operator, IList`1 operands)
at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.InvokeOperator(PdfLiteral operator, IList`1 operands)
at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.ProcessContent(Byte[] contentBytes, PdfResources resources)
at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.ProcessPageContent(PdfPage page)
at iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor.GetTextFromPage(PdfPage page, ITextExtractionStrategy strategy, IDictionary`2 additionalContentOperators)
at iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor.GetTextFromPage(PdfPage page, ITextExtractionStrategy strategy)
at Submission#3.ExtractText(String filePath)
at Submission#4.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

Following is the pdf file appeared in the code. 2007-1.pdf

jonsequitur commented 2 months ago

This might be an issue with this specific package. Do you happen to know what location it's looking for? For example, if it's looking in a build output location, it won't find it, since there's no build output for the C# Script.

Unrelated to the exception, Console.ReadLine won't work in the notebook. Input gestures are documented here: https://github.com/dotnet/interactive/blob/main/docs/input-prompts.md

hjy1210 commented 2 months ago

@jonsequitur About Do you happen to know what location it's looking for? What does it mean?

jonsequitur commented 2 months ago

I was referring to this from your exception details:

Error: iText.IO.Exceptions.IOException: The CMap iText.IO.Font.Cmap.UniCNS-UTF16-H was not found.
at iText.IO.Font.Cmap.CMapLocationResource.GetLocation(String location)

My guess is that this is a file in the package that the build would normally copy to the build output (in a normal C# project build). The code is probably looking for this file in that location. But C# Script doesn't do a build and so the file isn't in the expected location (but it is in the NuGet package cache).

This would be something that this package would need to account for in order to work correctly in C# Script / .NET Interactive.

hjy1210 commented 2 months ago

@jonsequitur The Visual Studio C# project build output directory contains following files, once click the execution file RxNetPuzzle.exe, the program executed as expected.

I still do not know how to fix the problem, thanks for your time.

itext.barcodes.dll
itext.bouncy-castle-connector.dll
itext.commons.dll
itext.font_asian.dll
itext.forms.dll
itext.io.dll
itext.kernel.dll
itext.layout.dll
itext.pdfa.dll
itext.pdfua.dll
itext.sign.dll
itext.styledxmlparser.dll
itext.svg.dll
Microsoft.DotNet.PlatformAbstractions.dll
Microsoft.Extensions.DependencyInjection.Abstractions.dll
Microsoft.Extensions.DependencyInjection.dll
Microsoft.Extensions.DependencyModel.dll
Microsoft.Extensions.Logging.Abstractions.dll
Microsoft.Extensions.Logging.dll
Microsoft.Extensions.Options.dll
Microsoft.Extensions.Primitives.dll
Newtonsoft.Json.dll
RxNetPuzzle.deps.json
RxNetPuzzle.dll
RxNetPuzzle.exe
RxNetPuzzle.pdb
RxNetPuzzle.runtimeconfig.json