EvotecIT / OfficeIMO

Fast and easy to use cross-platform .NET library that creates or modifies Microsoft Word (DocX) and later also Excel (XLSX) files without installing any software. Library is based on Open XML SDK
MIT License
261 stars 47 forks source link

Inserting elements in an arbitary position #91

Open byteSamurai opened 1 year ago

byteSamurai commented 1 year ago

First of all: Happy new year :)

I tried inserting a table after a paragraph, which is impossible. We can just append it to the end of the document. Is that correct?

Offtopic Discussion

I already tried to address my architectural concerns here. I see OfficeIMO as a great high-level API around open-xml-sdk. It attempts to abstract some of the complexity concerning office documents by capsulation. At the moment, I see these tasks:

Domain-wise aggregation hast a context, most of the time. In pseudocode, I would expect something like this:

var doc = new Document();
var p =  new Paragraph();
p.do()
p.some()
p.stuff()

doc.append(p)
// or
doc.findFancyElement().InsertAfter(p)

In other words: The paragraph is not related to the document. Paragraphs are only responsible for the consistency within themselves. The possible validation of how to aggregate this should occur in an AggregationStrategy, assuming not every element can be inserted everywhere.

I am not sure, but I think we are doing more of this:

var doc = new Document()
var p = doc.AddParagraph()
p.do()
p.some()
p.stuff()

This way, we are not able to insert the elements arbitrarily.

If this is true, another disadvantage would be partial XML serialization. The implementation of the paragraphs is well crafted. Wouldn't it be nice to get the XML representation for a single paragraph instance and paste it into the document using a file stream? This way, we could provide the best of the high-level API of OfficeIMO and keep it compatible with others. So to say: "If you think you need to do the aggregation in a special way, here you go."

I don't want to be impolite. I see all the work happening here, but those concerns should be separated, in my opinion. However, the refactoring would be tremendous, and I don't know yet how to do it in small steps.

WDYT @PrzemyslawKlys ? I hope my point of view is just wrong because I have to find a way to insert a table in the middle of a document by the end of the week :)

PrzemyslawKlys commented 1 year ago

Lets start with your last sentence - How will you insert a table in the middle of the document? Based on what conditions you will be deciding that? Do you need to find some text and insert after that?

The choice I made and something I've been thinking for a long time on how to address it - the tables are treated as a separate entity from Paragraphs. That means while it's great you can have doc.Tables[0], how do you tell where the table is located. I was thinking to address it by creating something that is like Paragraphs (doc.Paragraphs list) which basically contains 90% of all elements, with ListObejcts or similar that would do what Paragraphs do but with inclusion of other object types. That means you would just need to do foreach on Paragraphs, find what you need, and insert it there.

Paragraphs has already good logic that would require small extension. In theory we could use Paragraphs (list) and extend it with other objects, but that would be pretty much breaking change. Given this product is quite new, not really bad idea, but thought it maybe better added as something new.

https://github.com/EvotecIT/OfficeIMO/blob/06248795c9433f3da6f85056841eb3fc8dbe7c2c/OfficeIMO.Word/WordSection.PrivateMethods.cs#L26-L97

PrzemyslawKlys commented 1 year ago

On the other hand, if it's inserting table after paragraph, that's doable with just doc.Paragraphs[10]._Paragraph to get paragraph where it's located and insert after that using OpenXML. I believe it's internal or private, but we could make it public - something that's pretty fast and something that allows you to fix your work stuff quickly.

Or better, lets just add a method

var paragraph1 = document.AddParagraph("Lets add another table showing text wrapping around");
paragraph1.AddTable()

This shouldn't be too hard to do

PrzemyslawKlys commented 1 year ago

Here's a PR that does this: https://github.com/EvotecIT/OfficeIMO/pull/92

Would need some tests, and maybe Before/After. But generally the idea from above should happen one way or another.

PrzemyslawKlys commented 1 year ago

After some thinking renamed method and addded a new one:

image

internal static void Example_TablesAddedAfterParagraph(string folderPath, bool openWord) {
            Console.WriteLine("[*] Creating standard document with width and alignment");
            string filePath = System.IO.Path.Combine(folderPath, "Document with Table Alignment.docx");
            using (WordDocument document = WordDocument.Create(filePath)) {
                var paragraph = document.AddParagraph("Lets add table with some alignment ");
                paragraph.ParagraphAlignment = JustificationValues.Center;
                paragraph.Bold = true;
                paragraph.Underline = UnderlineValues.DotDash;

                WordTable wordTable = document.AddTable(4, 4, WordTableStyle.GridTable1LightAccent1);
                wordTable.Rows[0].Cells[0].Paragraphs[0].Text = "Test 1";
                wordTable.Rows[1].Cells[0].Paragraphs[0].Text = "Test 2";
                wordTable.Rows[2].Cells[0].Paragraphs[0].Text = "Test 3";
                wordTable.Rows[3].Cells[0].Paragraphs[0].Text = "Test 4";

                var paragraph1 = document.AddParagraph("Lets add another table showing text wrapping around, but notice table before and after it anyways, that we just added at the end of the document.");

                WordTable wordTable1 = document.AddTable(4, 4, WordTableStyle.GridTable1LightAccent1);
                wordTable1.Rows[0].Cells[0].Paragraphs[0].Text = "Test 1";
                wordTable1.Rows[1].Cells[0].Paragraphs[0].Text = "Test 2";
                wordTable1.Rows[2].Cells[0].Paragraphs[0].Text = "Test 3";
                wordTable1.Rows[3].Cells[0].Paragraphs[0].Text = "Test 4";

                wordTable1.WidthType = TableWidthUnitValues.Pct;
                wordTable1.Width = 3000;

                wordTable1.AllowTextWrap = true;

                var paragraph2 = document.AddParagraph("This paragraph should continue but next to to the table");

                document.AddParagraph();
                document.AddParagraph();

                var paragraph3 = document.AddParagraph("Lets add another table showing AutoFit");

                WordTable wordTable2 = document.AddTable(4, 4, WordTableStyle.GridTable1LightAccent1);
                wordTable2.Rows[0].Cells[0].Paragraphs[0].Text = "Test 1";
                wordTable2.Rows[1].Cells[0].Paragraphs[0].Text = "Test 2";
                wordTable2.Rows[2].Cells[0].Paragraphs[0].Text = "Test 3";
                wordTable2.Rows[3].Cells[0].Paragraphs[0].Text = "Test 4";

                var table3 = paragraph1.AddTableAfter(4, 4, WordTableStyle.GridTable1LightAccent1);
                table3.Rows[0].Cells[0].Paragraphs[0].Text = "Inserted in the middle of the document after paragraph";

                var table4 = paragraph1.AddTableBefore(4, 4, WordTableStyle.GridTable1LightAccent1);
                table4.Rows[0].Cells[0].Paragraphs[0].Text = "Inserted in the middle of the document before paragraph";

                document.Save(openWord);
            }
        }

Of course, one working on visual representation would probably need to add some empty paragraphs first and then append tables, otherwise it all gets mashed up.

                paragraph1.AddParagraphBeforeSelf();
                paragraph1.AddParagraphAfterSelf();

                var table3 = paragraph1.AddTableAfter(4, 4, WordTableStyle.GridTable1LightAccent1);
                table3.Rows[0].Cells[0].Paragraphs[0].Text = "Inserted in the middle of the document after paragraph";

                var table4 = paragraph1.AddTableBefore(4, 4, WordTableStyle.GridTable1LightAccent1);
                table4.Rows[0].Cells[0].Paragraphs[0].Text = "Inserted in the middle of the document before paragraph";
byteSamurai commented 1 year ago

@PrzemyslawKlys Man, you are really the owner of this project. I was in the gym and meanwhile you already provided a pragmatic solution for this. Chapeau! And thank you!

For me, we can close this issue in terms of tables. Since you already provided examples: Do you think xunit test are required here? Can I assist you somehow with #92


Apart from that, I think architectural discussion is required.

Lets start with your last sentence - How will you insert a table in the middle of the document? Based on what conditions you will be deciding that? Do you need to find some text and insert after that?

The choice I made and something I've been thinking for a long time on how to address it - the tables are treated as a separate entity from Paragraphs.

That is the exact reason why aggregation should be part of an aggregator. It involves more than just appending elements. You could search for paragraph with a specific text, or a Bookmark. But not all elements allow to be put into another. Bookmarks consists of two single tags as you know - just for example. How I would to it as first small refactoring (without answering the searching question):

I think what is really required is a visitor pattern, but the refactoring for the parsing and storing would be painfully huge and break the current API.

PrzemyslawKlys commented 1 year ago

Well, the methods themselves need to have some tests. As in that they exist, the count of tables matches the expected amount and that basic stuff. That amount of paragraphs still matches. Something that if people change something it doesn't get broken easily.

As for the architecture - well, the problem is probably me and my lack of skills. Some concepts fly around me because I am not a programmer. Just a guy that can hit the wall with his head long enough that things start working. That's how OfficeIMO works :-)

Anyways. For me, WordParagraph is kind of already that "Element". You can see properties such as IsField, IsBookmark, IsHyperLink, IsEquation, IsStructuredDocumentTag, IsListItem, IsImage. I did it probably by accident, but I guess this could use further optimisations.

I have used interfaces in my life just once and couldn't get them to work in PowerShell, so I quickly converted them to standard class. That's also one of the reasons that I I make things that can be easily accessible from PowerShell.

For example - solving https://github.com/EvotecIT/OfficeIMO/issues/90 - hyperlink is kind of like a Paragraph. It contains its own runs and its own runProperties. It can contain multiple runs as well. So a kind of nested paragraph. I tried to make it so WordHyperlink inherits from WordParagraph so all the styling for WordParagraph would just work on Hyperlinks, but couldn't get it to work. I don't code c# on a daily basis. I don't even work with other programmers, so I learn when I need to solve something on my own by asking on discord or reading blogs.

So this could only work if people help out and fix the hard stuff, that I can "pick" up and build on top of that. There's a reason why OfficeIMO hours show a lot of hours spent 😂😂 I'm a noob that just happens to have enough motivation to push projects in the right direction :)

byteSamurai commented 1 year ago

Regarding the Addition of Tables

I would take over #92 and add some tests; looking forward to you publishing it soon :) Is that okay? As I cannot work on branches within this project, I would update my fork and submit a new PR.

Regarding this Ticket

Should we leave it open? As I said: looking for some tasks while Hacktoberfest :)

Regarding the Future Architecture

I have used interfaces in my life just once and couldn't get them to work in PowerShell

I don't have much experience with PowerShell, but I assume you are using parts of the implementation in a script-like manner. Is that correct?

As you know, interfaces are just contracts about the behavior of your implementation. They make it easy to maintain, scale and cut implementations if done right. Sometimes you can achieve the same with abstract classes, though it creates stronger cohesion between classes. Take a look at C# generic method constraints to understand that C# by design wants you to describe the behavior of your parts by interfaces - you can define several interfaces per constrain, but only one class.

First of all:

Some concepts fly around me because I am not a programmer.

You are already a programmer! You just need some Software Engineer skills. On the other hand, I am not super experienced with C#/.NET in particular, but a bit with SE/SA overall. Therefore I see some tendencies for possible anti-patterns and code-smells and opportunities to introduce design patterns. Maybe we should have a Google Meeting in the next weeks to discuss our vision, use cases, and how I could help.

PrzemyslawKlys commented 1 year ago

I accepted my own PR - feel free to expand on master ;) Evolve it if you will, add more features, tests whatever ;)

I don't mind leaving this ticket open as it discusses a lot of other ideas.

Regarding the Future Architecture

PowerShell is a scripting language but it's based on .NET, so you can use .NET all around place. The problem is, some things are very hard to use. What's in C# 2 lines of code sometimes gets really ugly if not impossible to be written in PowerShell.

It's much easier to just use C# classes in PowerShell than it it to use interfaces and that's what I mean. It's hard to even find resources how one would use c# interface in PowerShell.

Anyways - that's not really a problem, just something to keep in mind that whatever is exposed in the public, has to be sort of usable in PowerShell without doing tricks and code that want me to pull out my hair. But if it's somewhere internally used, it's not a big deal as PowerShell will never be visible.

And while I could use more experience - the problem is - it's not part of my job :) That means I focus on things that I have to do and stuff like OfficeIMO and so on, are just a hobby. As for the meeting I am not sure I would be competent enough to talk about 'patters or anti-patterns or code-smells' because I lack any knowledge on it. That means I can't understand in real time what you would be saying :-) I need to read it couple of times, try to play with it, see what can be done.

But like I said before - I am open for people ideas and improvements all over the place. I believe the code is working, but it's far from being optimized. I prefer the product to be working, having lots of features before you get and do it right - but at the same time it's functional 2 years later.

byteSamurai commented 1 year ago

I prefer the product to be working, having lots of features before you get and do it right - but at the same time it's functional 2 years later.

I understand it very well. If I wouldn't work on my PhD at the moment, I would probably spend more time on this project, since it's kind of a therapy to me, to make things pretty 🤣 For that reason, I will come back while hackoberfest

Thanks for merging #92 Could you also publish it please?

hisuwh commented 11 months ago

Is there a way to add a List to a Paragraph or to a Table Cell?

Being able to add to a Paragraph would solve both actually and also allow for nested lists which I can't see a way to do currently.

PrzemyslawKlys commented 11 months ago

I'm working on it as part of:

Having small issues, but will try to resolve this week

PrzemyslawKlys commented 11 months ago

I may need to expand Rows[0] and add ability to add paragraphs or other things to it. Initially I thought it would be good idea to work as part of Rows[0].Paragraphs[0].AddParagraph or something, but it doesn't seem to be working correctly and that would mean subsequent addParagraphs would actually need to be refereed by Paragraphs[1].AddParagraph or something which doesn't seem very well working. But I'll figure it out.

PrzemyslawKlys commented 11 months ago

Btw @hisuwh i think i missed your comment about nested lists. I believe it's already there:

WordList wordList1 = document.AddList(WordListStyle.Headings111, true);
wordList1.AddItem("Text 1");
wordList1.AddItem("Text 1.1", 1);
WordList wordListNested = document.AddList(WordListStyle.Bulleted, false);
wordListNested.AddItem("Nested 1", 2);
wordListNested.AddItem("Nested 2", 2);

image

Unless you have something else in mind? You just need to play either continue numbering or not.

The way I've defined lists is that AddList() basically precreates styling in Word, but doesn't do much of that. And then if you refer to it by doing list.AddItem() it's basically adding a paragraph and marks it as a list item of specific list.

The problem with that of course it's taking the very last paragraph in document and just adds to it. This isn't ideal when you think on nesting lists in tables or in other places. So I have to think on a way to either convert paragraph to a list item, so in worst case scenario you can just create list anywhere (as it's just placeholder). I'm having some issues on usability atm.

boatwrong commented 9 months ago

@PrzemyslawKlys I think this is related to this issue otherwise I can open create another.

I needed to put an image in an arbitrary position so I worked out an additional WordParagraph.AddImage() function that lets the user provide their own DocumentFormat.OpenXml.Drawing.Wordprocessing.Anchor object to give finer grain controls over the images position.

If this is a feature that you would want in the library I'd be happy to open a PR - otherwise, great work on this library and thank you so much for effort on the project. I would more than likely be lost in the OpenXML docs for weeks on end.

PrzemyslawKlys commented 9 months ago

Please open another issue, but generally I'm more than happy for PRs. There's plenty of things that should be added/improved and once people start using it hopefully PRs/improvements will follow. And openxml is a beast, but I am at this stage of a project that basics of Words are done ;)

Garth-CD commented 3 months ago

Would it be in the real of possibility to enhance the AddTable function to take a PSObject as well? The same way one would use New-OfficeWordTable with the -Datatable switch.

PrzemyslawKlys commented 3 months ago

@Garth-CD please don't hijack issues. THis one is not related to AddTable so I am not sure why would you add it here? Also please describe your use cases and why would you need one regarding PSObject when you already have PSWriteOffice doing so?

Garth-CD commented 3 months ago

@PrzemyslawKlys Hi there, not trying to hijack an issue. I ran into a similar situation as the original poster of this issue where by I was trying to add a table to a specific line in a word document.

I also saw that this issue was left open for further ideas or suggestions, which is all I was making.

After reading through this issue, and the enhancements that were made to the paragraph by the addition of the AddTable, I saw that it only allows the creation of a new table after the parapraph? In my case, I already had an object in my PowerShell script with the data.

PrzemyslawKlys commented 3 months ago

@PrzemyslawKlys Hi there, not trying to hijack an issue. I ran into a similar situation as the original poster of this issue where by I was trying to add a table to a specific line in a word document.

I also saw that this issue was left open for further ideas or suggestions, which is all I was making.

After reading through this issue, and the enhancements that were made to the paragraph by the addition of the AddTable, I saw that it only allows the creation of a new table after the parapraph? In my case, I already had an object in my PowerShell script with the data.

This is already solved no?

 var table3 = paragraph1.AddTableAfter(4, 4, WordTableStyle.GridTable1LightAccent1);
table3.Rows[0].Cells[0].Paragraphs[0].Text = "Inserted in the middle of the document after paragraph";

var table4 = paragraph1.AddTableBefore(4, 4, WordTableStyle.GridTable1LightAccent1);
table4.Rows[0].Cells[0].Paragraphs[0].Text = "Inserted in the middle of the document before paragraph";

In other words you can just use var paragraphs = document.Paragraphs, find the paragraph that you want to use as point in document and just use either before or after? Or am I missing something?

Garth-CD commented 3 months ago

@PrzemyslawKlys I think what you are missing is that, the paragraph1.AddTableAfter & paragraph1.AddTableBefore accepts 2 integer for rows and columns and a WordTableStyle as arguments to build the table, whereas I want to be able to pass a custom PSObject with all the rows and columns and output it before or after the paragraph.

Does this make more sense? Or am I missing something?

PrzemyslawKlys commented 3 months ago

If you're using this part of PowerShell you can just modify the New-OfficeWordTable to accept Paragraph as parameter. Whether you use AddTable or AddTableAfter the logic is all in PowerShell. Of course one could create a logic inside OfficeIMO to allow easier adding objects into tables directly, but it has nothing to do with "positioning" so pretty much different issue/improvement.