AngleSharp / AngleSharp.Js

:angel: Extends AngleSharp with a .NET-based JavaScript engine.
https://anglesharp.github.io
MIT License
105 stars 23 forks source link

Question: retrieving values #18

Closed paulflo150 closed 9 years ago

paulflo150 commented 9 years ago

Hi, give the response below is it possible to retrieve the data property as well as the return defined inside the attach block?

<!DOCTYPE html>
<html>
<head>
    <title>Test</title>
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js"></script> 
</head>
<body>
    <script>
        (function (maps, sdk) {
           var data = { "property1": "123" }
        }(window.maps = window.maps || new Module(), sdk));

        define('attach', [], function () {
            "use strict";
            return {
                enabled: false
            };
        });
    </script>
</body>
</html>

I have followed one of the samples, and I am able to retrieve it if the variable is global, however this is the response I need to work with in my case:

var document = await BrowsingContext.New(config).OpenAsync(m => m.Content(html));
var foo = service.Engine.GetJint(document).GetValue("data");

Thanks!

FlorianRappl commented 9 years ago

The value is only present in a local scope (using IIFE). Therefore there is no way to retrieve it. The only chance I see is either by source code modification (find the IIFE(s) and append their locals to a global object) or by modifying / abusing Jint in a way to store such locals somewhere.

paulflo150 commented 9 years ago

I tried your suggestion, but it still returns undefined:

<!DOCTYPE html>
<html>
<head>
    <title>Test</title>    
</head>
<body>
    <script>
        var data = {};
        (function () {
            data.test = { "property1": "123" };
        })();
    </script>    
</body>
</html>
FlorianRappl commented 9 years ago

This seems to work for me.

var javascript = new ScriptingService();
var config = Configuration.Default.With(javascript);
var parser = new HtmlParser(config);
var document = parser.Parse(@"<!DOCTYPE html>
<html>
<head>
<title>Test</title>    
</head>
<body>
<script>
var data = {};
(function () {
data.test = { 'property1': '123' };
})();
</script>    
</body>
</html>");

var engine = javascript.Engine.GetJint(document);
var serializer = new JsonSerializer(engine);
var foo = engine.GetValue("data").AsObject();

Console.WriteLine(serializer.Serialize(foo, Undefined.Instance, Undefined.Instance));
Console.ReadLine();

How does your code look like? Are you using a configuration with the JavaScript engine? (It does not matter if you use BrowsingContext or directly an HtmlParser - I only use the latter to have synchronous parsing in a console application).