IntelRealSense / librealsense

Intel® RealSense™ SDK
https://www.intelrealsense.com/
Apache License 2.0
7.55k stars 4.82k forks source link

Suggestion for a better sharing strategy #6053

Open RichardKCollins opened 4 years ago

RichardKCollins commented 4 years ago
librealsense 2.31.0 RELEASE
OS Windows
Name Intel RealSense T265
Serial Number 943222110717
Firmware Version 0.2.0.879
Product Line T200

Please provide a description of the problem

I have only just started to use this device. The viewer shows the two fisheyes and the gyro, pose and accelerometer streams. Your bag file format sucks. Can you give me the data in the Viewer and let me process, summarize and write my own data?

I am a statistician and a policy maker, not a hacker. If I get the data into an array, I can write algorithms all day long.

Because I work a lot on the Internet, I really want to work in Javascript and just treat the device like a database. Registering a listener in Javascript is fine,

You want to sell bajillions of these, Why make it so hard for people to just get the data?

I can tell you - don't give new users binary data. Do give new users sample datasets in Ascii that they can get a sense of how much there is, and what the range of values will be. For each kind of application show them the kinds of patterns they will see in the data. Let people run examples from the web in Javascript, and allow them to plug in the cameras (I have the T265 and 435i) and run from html/js pages from any website.

Thanks for listening. I hope you are listening. I want to work with data and make wonderful statistical profiles of datastreams, not waste my few remaining years fighting stupid and incomplete documentation and half assed program examples. Am I being harsh? You bet. The viewer is crisp, but the people who wrote it made up their own navigation paradigm. The pause on the individual boxes shuts down the whole sensor, not just that data stream. These are independent data streams that can be used in any combination. You shut down all the streams to turn off the individual streams. And your save format is not immediately usable. Do you know javascript? Can you simply call and send arrays of records? Arrays of objects?

If you open the data to javascript, there is an entire world out there that has access to Javascript. And only a tiny tiny tiny number who want to bother with C and proprietary and junky tools like the ones you chose. Yes, you are doing a great job. You had to write in C because the javascript compilers are still being written and tested. But you can let javascript users get to the data. Once you do that, they can share algorithms with each other that are just a click from using the devices. No libraries, no other crap. A regular html server and local processing. The machines are getting fast enough for LOTS of the introductory processing that teaches people and a community where new algorithms can be written and shared.

DO NOT force everyone who wants to use these devices and their data streams to try using the crappy methods on GitHub. It is fine for hackers, and computer scientists, but not for most people. I think it is bad for hackers myself.

I know what people and sensor networks are doing. And forcing people to spend weeks or months to learn GitHub and C and these old coding environment is wrong for society.

Add a little smarts to the viewer. Give people the choice of simple tables of ascii tab separated records with the data from the sensors. Once they can use that and need better performance, Give them a hook into your compiled programs where they can write their own on-sensor sampling and processing routines. The data choices do not cover all users. So give them a way to do that from the most commonly used programming environment on the planet.

I get rather upset sometimes. There is hardly a sensor group on the entire Internet that is not wasting days of user time for things that should only take minutes. Years of user time for things that should take days. That could be ignored when there were only a few thousands of hackers time wasted, and who cares about them anyway? ( joking ) I have to deal with every user on the planet, now and decasdes into the future. It is shameful how much waste there is of human time because of tiny laziness and lapses on the part of just a few people. It is not hard to do it right.

Richard Collins, Director, The Internet Foundation

MartyG-RealSense commented 4 years ago

Thanks so much for your constructive input! There is a lot of different points raised in your excellent post. I will label this message as 'Enhancement' and leave it open so that the RealSense developer team can track it and consider the points raised in it.

In regard to JavaScript, you may be interested in an HTML5 interface for RealSense cameras that I discovered yesterday.

https://github.com/IntelRealSense/librealsense/issues/6047

Thanks again for sharing your expertise!

RichardKCollins commented 4 years ago

Marty,

That demo's "Run live demo" is broken, but the concept is right - let people plug in their camera, click on a link and see their camera in action - no download and install, no unreadable documentation, no libraries that are not up to date. And the possibility of user communities sharing distributed examples that can be mapped to be like a single experience. A virtual site, what I call a "global community".

Unfortunately, HTML5 support for cameras in arcane and essentially unsupported. The people working on the core methods don't get paid, and they introduce changes that affect users they never meet. I can read his example because I spent several years off and on trying to get various cameras and sensors and microphones ot work in HTML5. What is needed is simpler.

A compiled localhost server with an html listener that can access hardware, carry on a continuous dialog with an html/js client

Do you know how to write an HTTP Server or WebSocket server in a compiled language with access to cameras and sensors? Javascript can send http or websocket messages like : Give me a list of devices and their properties and supported controls as a Javascript object. Call this routine with an array of pixels and layers when you get it. Call this other routine when you get the IMU and temperature data. What I am doing is isolating the hardware access into a service that can run in every different environment, and speak directly to someone in Javascript. The local server identifies any new devices, checks their properties, and calls any registered listeners.

The exact same kind of thing is needed for embedded MCUs/GPUs/processors next to the sensors where low latency is required. I want to develop the low latency routines. But it takes extensive data collection to find efficient statistical lossless compression as each datastream can be unique.

Thanks for the encouragement. I work on literally the whole Internet, so I am constantly running into the worst things. The most common is where a small group stops just short of where major improvements can be made for millions of people.

I have a Mayflower ancestor and there are 15,000 duplicates of his family on Ancestry. So much duplication is one issue, but what saddens me is that if you check all the children there are thousands of people who have researched back a dozen generations, only to stop at that first generation - not knowing they are literally one generation from connecting to a famous ancestor. The same is true of the 2nd, grandchildren, generation. Many of them do not know they are two generations away. All the way down to the present. That is roughly 45,000 people who have ancestors in the tree and do not know it. The way things are set up now, they will spend decades finding their way. (My hobby is finding birth parents for people from their DNA alone, and my Mayflower ancestor, his parents are unknown).

Sorry for such a long example. But the same thing is happening across the entire web. A programmer leaves off a reply link from a confirmation email for an item purchase at a global online retailer, and tens of thousands of people have to call in to find out what happened, rather than being immediately routed to someone with their details already at hand. The programmer never hears, and the customers and call center never figure out that it is happening because there is no one auditing their whole process - including the web.

When there are manual steps in a business process, such a someone having to call, or having to download and install a program, or having to download, compile and try to get even simple examples working - time stretches out. Two or three seemingly simple steps can completely block someone without an extensive background.

You can recognize them by such statements as "just download Python 8.31", "go to GitHub, find all the dependencies of ABCDEFG.015, install them using NodeJs, only on Windows 10, but use the previous version that supports interleave, but only if you are not trying to include a graphics processor" I am using made up names, but most of the GitHub sites are at least that vague and complex. A very intelligent and dedicated person can get through it. But most people are not keen on knowing how to build developments systems, and are just dealing with it because they have to.

Here is an example from their depth-camera.js -- "Use videoKind if it is supported. At the moment it is experimental; to use it. Chrome needs to be started with command line argument: --enable-blink-features=MediaCaptureDepthVideoKind" and "We cannot use videoKind yet, so try to make a constraint that would most likely resolve to a depth camera. Later, we use camera label to check if we really got a depth track." These would not be necessary if the browser is not trying to talk to the device directly, but through a local host server that the browser can talk to by http and websocket.

I have downloaded and reviewed about 60 of the larger GitHub projects. They contain so many undocumented dependencies and human-in-the-loop steps, most are short and long term unsupportable, except by many people spending days on steps that should be automated. The whole of GitHub and many similar sites is eating up hours for the people directly involved. But, worse, society as a whole loses - because tools and methods that could unblock major global problems are locked behind essentially infinite learning curves. And their millions of potential users are blocked from tools they can use for serious global problems.

I was reading over this before Comment. I realize that Intel RealSence can have live cameras running on the internet with datastreams that are recorded and can be played back as example live streams. And that can be controlled remotely from javascript. I try to track ALL global sensor communities and the radio telescopes, regular telescopes, software defined radios and many other sensors are going this direction. Cameras are lagging way behind. At least in sharing quantitative methods. Frankly, I like to kick the tires before I commit to a sensor. That is, I like to see the actual datastreams, try my own favorite algorithms, and run my own (usually days long or months long) calibrations.

This messagebox is not standard and requires translation to use with other editors. Little things like that cost lost time and opportunities.

Richard Collins, The Internet Foundation

MartyG-RealSense commented 4 years ago

Yes, the links are somewhat broken on the page but the images demonstrate its functionality and I figured that anyone interested in using it could use the files to get it up and running on their own webhost. smiles

The RealSense SDK has Node.js support officially and a Java wrapper unofficially as a community member contribution, though I have not used them myself personally.

https://github.com/IntelRealSense/librealsense/tree/master/wrappers/nodejs https://github.com/edwinRNDR/librealsense/tree/master/wrappers/java

I can write HTML and cascading style sheet code but webcode like sockets is outside of my experience, unfortunately.

My mother's main hobby for years has been family history and I hear a lot about her research going back 1000 years, so I can definitely understand your Mayflower references! :)

You might enjoy Intel's CES 2018 keynote if you have not seen it already, as it was completely focused on the importance of data.

https://youtube.com/watch?v=QT8xeBpfVyM

The RealSense developer team are totally open to public code contributions. If someone wants to write a wrapper to use the camera on a new platform, they will happily accept contributions via GitHub pulls, try to integrate the contributions into the SDK and maintain them from that point onward.

Thanks again for the great contributions!

RichardKCollins commented 4 years ago

Marty,

I already work 18 hours a day, seven days a week, I am not sure how many more things I can look at. I am trying to solve a problem for a group of people trying to put together a global network of imaging gravimeters. I think they can use image sensors to verify the magnetic and gravitational noise backgrounds. I was hoping to get that done this weekend. But every step in using this depth sensor is tedious with endless backtracking and fruitless searching just to read the data and find documents.

So I am happy Intel is proud they are doing things, and can put on a media event and blah blah words about "we are doing great things", but I feel they are leaving the important things undone. No amount of words will help me help billions of people with hard problems.

I tried the NodeJs install earlier today and it crapped out. I do not have time to debug yet another unsupported experiment.

Yes, I am sure the RealSense developer team would love more freely contributed hours/years. But it feels a bit like "Here use this crappy support environment, and if you come up with something we are happy to take credit." I am sorry, it just does not feels like a partership toward a common goal, but exploitation.

I have my D435i connected and I see the drivers listed in the Device Manager of Windows 7 under USB devices. But they are not showing up in the list of Devices in HTML5/Chrome getUserMedia

I found where they used Chromium to get around the limited Chrome support for devices. It is as they said, you can't do this in a normal browser but have to hack your own Chromium browser. I wish I had infinite time, but not enough to build my own browser this weekend.

https://01.org/chromium/blogs/astojilj/2017/depth-camera-capture-html5 by searching for "enumerateDevices" "realsense"

I ran this in Chrome and looked at the Devices object. It does not include any video devices, and none from Intel. I can separately run your Viewer and see the camera data.

navigator.mediaDevices.enumerateDevices().then(ListDevices);
function ListDevices(Devices)
{
    for (var i in Devices)
    {
        var device = Devices[i];
    }
    return;
}

All this kind of stuff can be taken out of the browser completely, put into a nice compiled server and any browser can call the device server from anywhere on the network. Locally it is fast enough for development of image algorithms such as classification and model extraction. And the same server language can be embedded in a local processor. If you have any camera that can take an added local chip for user programs.

Look. Is there any tool that Intel RealSense supports fully?

Can I get a copy of the RealSense Viewer that will call my programs when the data comes in? OnFrameComplete, OnMotionDataComplete, that sort of thing? It does not even read the IMU data. Add a few things and it could talk to html/js and save data too.

Did your Mom do her Ancestry DNA? I have literally made 900 videos to teach people how to use Ancsestry DNA to find birth parents besides building the trees. For anyone extending and verifying their pedigree and connecting to all their DNA matches, it is about a hundred times easier with DNA than without. After helping about 150 people to use their DNA and half missing parents, I know precisely how much time it saves. I do that sort of thing to learn how much effort is required at every step. I spent years working with the seismometer networks so I would know what they are encountering. 22 years at about 100 hours a week and 50 weeks a year is nearly 55 manyear equivalents (2000 hours per MYE) I have been tracing out methods and processes on the Internet.
Is there any way that Intel could help me get ONE simple example of a camera server working that anyone can use from existing browsers? That can include displaying the screen in Windows (or Linux or Android or Mac or ...) with the controls and settings in html/js. Not just eye candy, but real processing of the images and a place for anyone to bolt in their own algorithms, then share the results?

On the scale of billions of people using cameras or wanting to use cameras, a few man years is nothing. Intel is already trying to do these things anyway, but the efforts are scattered all over, and your CEO seems happy to just talk about it. When is the last time he pitched in to help directly?

I should not have to do everything myself. I turned 71 this year and feel like 90. These are not hard problems, just tedious, requiring meticulous care and ignoring marketing teams.

When I start complaining, I know I am tired. It is 5 pm and I have not even had lunch yet. LOL!

Richard Collins, Director, The Internet Foundation

MartyG-RealSense commented 4 years ago

The RealSense Viewer has open-source code, so users are free to add to it and adapt it to their own needs. I can understand the time commitment involved in such a development effort though.

https://github.com/IntelRealSense/librealsense/tree/master/tools/realsense-viewer

The details of family history searches are usually best kept in the family! I can definitely understand the intricacies of the research involved in such work and the time and labor involved though, and I'll leave it at that.

The focus of RealSense cameras tends to be whatever is within a 100 meter range of the camera (the distance of cable typically associated with fiber optic or ethernet cabling used with the cameras). I saw comments recently that were alarmed at the prospect of skeletal tracking and face recognition of mass crowds of people, and that is indicative of the kinds of barriers that cameras with AI and internet connection have to overcome in order to become widespread.

Thanks again!

dorodnic commented 4 years ago

Hello Richard, I'm managing the development of this project, and would like to apologize for your frustration - JavaScript integration, especially for the T265, leaves a lot to be desired at this point. There is no intention to waste your time. Our priority is directly tied to how these devices are being used - C++, python, ROS are the main three right now. You are absolutely correct that Web based environment would help at product evaluation phase. I'll make sure it's going to be addressed in the future. The decision behind binary data format is due to bandwidth - even in binary form the amount of data you are getting is often above what average hard-drive can manage. We can add export of short "bursts" of data in a more readable format. Finally, this forum has a code of conduct. Please help us keep the conversation inclusive and harassment free.

RichardKCollins commented 4 years ago

Dorodnic,

I wrote this long note below, then thought a video might describe what I am seeing and trying to do. I apologize it is so crude.

Global Sensor Networks My Interest in High Regions per Second Devices https://youtu.be/mpZ2FOIg1bk

Please write any comments there, as carefully as you did here. Thank you so much.

Richard Collins, The Internet Foundation

Global Sensor Networks My Interest in High Regions per Second Devices Diagram

Thank you for clarifying your priorities, and for thinking about Javascript users. Sometimes I get impatient. Javascript seems to be available on every platform. I have tried to look at every possible user on the Internet. There are people from all walks of life who have interest in quantitative image analysis, or who would use it -- if the tools were more like a calculator or controlled device, than a soldering and electronics project.

You understand how to program in the languages that the manufacturers chose first. But only a few people can manage that. Only one person in a thousand world-wide can understand the C++, python and ROS environments. and far fewer can take the time to learn how to program these devices to get maximum benefit from them.

So what do all the rest of the people do? Right now they have to do without. There are many more scientists, engineers, mathematicians, statisticians, industrial and commercial users who have access to a browser with html and Javascript, than to those languages. So what can they do?

I searched "How many C++ programmers are there in the world?" and "How many programmers". The estimates vary, but it is probably not a lot more than 20 million. The number of internet users with access to operating systems that support html/js is closer to a few billion. I think one is a thousand is not too far off.

Think of all the people you know from school. Did most of them become proficient enough to program the internals of every device? No. But think how many take jobs or have interest in automating their use of devices? I think everyone. And don't think that the artists or farmers or "non technical" people should be left out. They might be the ones to use devices the best.

If you write a compiled program for each of the main platforms - Windows, Linux, Apple, Arduino, Embedded - and set it up as a server that can control the device in C++, ROS, Python. If it can "speak" a clear and simple language to html/js users many more people could pick up any sensor (camera, microphone, etc) and use it completely.

The browser writers are not good enough to learn every device. They could learn a single "device control and sensor data analysis language" so any browser user could get the most out of these internally complex tools, but externally fairly simple ones.

I am recommending globally that the device controls be taken out of the operating systems and put into a device interface language. That cannot be done over night. But any of these language you use for talking to the camera can run http servers or websocket and then used locally by the browser, or remotely by others. So I am recommending a localhost http server that controls the camera locally, and speaks http/camera control language or http/sensor control language.

This is hard to say by typing text words.

RealSenseSupport commented 4 years ago

Thank you for highlighting the need for new features regarding T265. We have moved our focus to our next generation of products and consequently, we will not be addressing these updates for the T265.